Fresh IDE . Check-in [d538990122]
Not logged in

This repository is a mirror!

The original is located on: https://fresh.flatassembler.net/fossil/repo/fresh
If you want to follow the project, please update your remote-url

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Merged with the help_update branch. Almost ready for release.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1:d538990122d8889928c5076186c7bff04c04a118
User & Date: johnfound 2017-10-26 11:05:29
Context
2017-10-26
17:20
Merged with FreshLibDev in order to get the latest version. check-in: 5d35c7bbd0 user: johnfound tags: trunk
11:05
Merged with the help_update branch. Almost ready for release. check-in: d538990122 user: johnfound tags: trunk
10:57
Merged with "newskin" branch. check-in: 142ab39cbf user: johnfound tags: trunk
2017-10-24
08:28
Updated the help system to allow raw headless html files and to decorate them in the same way as the .md files.

The extension of such files is .rhtm; The FASM LaTeX manual has been updated to the latest version and formatted to .rhtm file. The doc/tools/ directory was cleaned up from outdated tools and LaTeX_to_html tool (tth) has been updated to the latest version. Notice, that in doc/tools/tth_src is a modified version of the tool that emits anchor tags in a way, Fresh IDE help system use them.

This help files update is part of preparations for the new release of Fresh IDE together with "newskin" branch. check-in: c8ad2b898b user: johnfound tags: help_update

Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Name change from doc/FASM.html to doc/FASM.rhtm.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
...
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
...
395
396
397
398
399
400
401
402
403
404
405
406
407
408

409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
...
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
...
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
...
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
...
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
...
538
539
540
541
542
543
544




545
546
547
548
549
550
551
...
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
...
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
...
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720



721
722
723
724
725
726
727
...
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786

787
788
789
790
791
792
793

794
795
796
797
798
799
800
...
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
...
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
...
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
...
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
...
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994


995
996
997
998
999
1000
1001
1002
1003
1004
1005
....
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
....
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
....
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
....
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
....
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
....
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
....
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
....
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
....
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
....
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
....
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
....
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
....
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
....
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
....
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
....
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
....
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
....
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820

1821
1822
1823
1824
1825
1826
1827
1828
....
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
....
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
....
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
....
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
....
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
....
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212

2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
....
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
....
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
....
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
....
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
....
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
....
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561

2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679

2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691

2692
2693
2694
2695
2696
2697
2698
2699

2700
2701
2702
2703
2704
2705
2706
2707
....
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786

2787
2788
2789
2790
2791
2792
2793
2794
....
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
....
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853

2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
2865
2866

2867
2868
2869
2870
2871
2872
2873
2874
....
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
2889

2890
2891
2892
2893
2894
2895
2896
2897
2898
2899
2900
2901
2902
2903
2904
2905
2906

2907
2908
2909
2910
2911
2912
2913
2914
2915
....
2919
2920
2921
2922
2923
2924
2925
2926
2927
2928
2929
2930
2931
2932
2933
2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951

2952
2953
2954
2955
2956
2957

2958
2959
2960
2961
2962
2963

2964
2965
2966
2967
2968
2969
2970

2971
2972
2973
2974
2975
2976
2977
2978

2979
2980
2981
2982
2983
2984
2985

2986
2987
2988
2989
2990
2991
2992

2993
2994
2995
2996
2997
2998
2999
3000

3001
3002
3003
3004
3005
3006
3007
3008

3009
3010
3011
3012
3013
3014
3015
3016
3017
3018

3019
3020
3021
3022
3023
3024
3025

3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046

3047
3048
3049
3050
3051

3052
3053
3054
3055
3056
3057
3058
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
3069
3070
3071
3072
3073
3074
3075
3076
3077
3078
3079
3080
3081
3082
3083
3084
3085
3086
3087
3088
3089
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
3108
3109
3110
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
3122
3123
3124
3125
3126
3127
3128
3129
3130
3131
....
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161

3162
3163
3164
3165
3166
3167
3168
3169
3170
3171
3172
....
3175
3176
3177
3178
3179
3180
3181
3182
3183
3184
3185
3186
3187
3188
3189
3190
3191
3192
3193
3194
3195
3196
3197
3198
3199
3200
3201
3202
3203
3204
3205
3206
3207
3208

3209
3210
3211
3212
3213
3214
3215
3216
....
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
....
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
....
3270
3271
3272
3273
3274
3275
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295
3296
3297
3298
3299
3300
3301
3302
3303
3304
3305
3306
3307
3308
3309
3310
3311
3312
3313
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323
3324
3325
3326
3327
3328
3329
3330
3331
3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
3354
3355
3356
3357
3358
3359
3360
3361
3362
3363
3364
3365
3366
3367
3368
3369
3370
3371
3372
3373
3374

3375
3376
3377
3378
3379
3380
3381
3382
3383
3384
3385
3386
3387
3388
3389
3390
3391
3392
3393
3394
3395
3396
3397
3398
3399
3400
3401
3402
3403
3404
3405

3406
3407
3408
3409
3410
3411
3412
3413
3414
3415
3416
3417
3418
3419
3420
3421
3422
3423
3424
3425
3426
3427
3428
3429
3430
3431
3432
3433
3434
3435
3436
3437
3438
3439
3440
3441
3442

3443
3444
3445
3446
3447
3448
3449
3450
....
3451
3452
3453
3454
3455
3456
3457
3458
3459
3460
3461
3462
3463
3464
3465
3466
3467
3468
3469
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
3484
3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498

3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509
3510
3511
3512
3513
3514
3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540

3541
3542
3543
3544
3545
3546
3547
3548
3549
3550
3551
3552
3553
3554
3555
3556
3557
3558
3559
3560
3561
3562
3563
3564
3565
3566
3567
3568
3569
3570
3571
3572
3573
3574
3575
3576
3577
3578
3579
3580

3581
3582
3583
3584
3585
3586
3587
3588
....
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601

3602
3603
3604
3605
3606
3607
3608
3609
....
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624

3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642
3643
3644
3645
3646
3647
3648
3649
3650
3651
3652
3653
3654
3655
3656
3657
3658
3659
3660
3661
3662
3663
3664
3665
3666
3667
3668

3669
3670
3671
3672
3673
3674
3675
3676
3677
3678
3679
3680
3681
3682
3683
3684
3685
3686
3687
3688
3689
3690
3691
3692
3693
3694
3695
3696
3697
3698
3699
3700
3701
3702
3703
3704
3705
3706
3707
3708
3709
3710
3711
3712
3713
3714
3715
3716
3717
3718
3719

3720
3721
3722
3723
3724
3725
3726
3727
....
3728
3729
3730
3731
3732
3733
3734
3735
3736
3737
3738
3739
3740
3741
3742
3743

3744
3745
3746
3747
3748
3749
3750
3751
3752
3753
3754
3755
3756
3757
3758
3759
3760
3761
3762
3763
3764
3765
3766
3767
3768
3769
3770
3771
3772
3773
3774
3775
3776
3777
3778
3779
3780
3781
3782
3783
3784
3785
3786
3787
3788
3789
3790
3791
3792
3793
3794
3795
3796
3797
3798
3799
3800
3801
3802
3803
3804
3805
3806
3807
3808
3809
3810
3811
3812
3813
3814
3815
3816
3817
3818
3819
3820
3821
3822
3823
3824
3825
3826
3827
3828
3829
3830
3831
3832
3833
3834
3835
3836
3837
3838
3839

3840
3841
3842
3843
3844
3845
3846
3847
3848
3849
3850
3851
3852
3853
3854
3855
3856
3857
3858
3859
3860
3861
3862
3863
3864
3865
3866
3867
3868
3869
3870
3871
3872
3873
3874
3875
3876
3877
3878
3879
3880
3881
3882
3883
3884
3885
3886
3887
3888
3889
3890
3891
3892
3893
3894
3895
3896
3897
3898
3899
3900
3901
3902
3903
3904
3905
3906
3907
3908

3909
3910
3911
3912
3913
3914
3915
3916
3917
3918
3919

3920
3921
3922
3923
3924
3925
3926
3927
3928
3929

3930
3931
3932
3933
3934
3935
3936
3937
3938
3939
3940
3941
3942
3943
3944
3945
3946
3947
3948
3949
3950
3951
3952
3953
3954
3955
3956
3957
3958
3959
3960
....
3969
3970
3971
3972
3973
3974
3975
3976
3977
3978
3979
3980
3981
3982
3983
3984
3985
3986
3987
3988
3989
3990
3991
3992
3993
....
4008
4009
4010
4011
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
....
4079
4080
4081
4082
4083
4084
4085
4086
4087
4088
4089
4090
4091
4092
4093
4094
4095
4096
....
4114
4115
4116
4117
4118
4119
4120
4121
4122
4123
4124
4125
4126
4127
4128
4129
4130
4131
4132
4133
....
4136
4137
4138
4139
4140
4141
4142
4143
4144
4145
4146
4147
4148
4149
4150
4151
4152
4153
4154
4155
4156
4157
4158
4159
4160
4161
4162
4163

4164
4165
4166
4167
4168
4169
4170
4171
4172
4173
4174
4175
4176
4177
4178
4179
....
4186
4187
4188
4189
4190
4191
4192
4193
4194
4195
4196
4197
4198
4199
4200
4201
4202

4203
4204
4205
4206
4207
4208
4209
4210
4211
4212
4213
4214
4215
4216
4217
4218
4219
4220
4221
4222
4223
4224
4225
4226
4227
4228
4229
4230
4231
4232
4233
4234
4235
4236
4237
4238
4239
4240
4241

4242
4243
4244
4245
4246
4247
4248
4249
4250
4251
4252
4253
4254
4255

4256
4257
4258
4259
4260
4261
4262
4263
....
4273
4274
4275
4276
4277
4278
4279
4280
4281
4282
4283

4284
4285
4286
4287
4288
4289
4290
4291
....
4293
4294
4295
4296
4297
4298
4299
4300
4301
4302
4303
4304
4305
4306
4307
4308
4309
4310
4311
4312
4313
4314
4315
4316
4317
4318
4319
4320
4321
4322
4323
4324
4325
4326
4327
4328
4329
4330
4331
4332
4333
4334
4335
4336
4337
4338
4339
4340
4341
4342
4343
....
4344
4345
4346
4347
4348
4349
4350
4351
4352
4353
4354
4355
4356
4357
4358
4359
4360
4361
4362
....
4363
4364
4365
4366
4367
4368
4369
4370
4371
4372
4373
4374
4375
4376
4377
4378
4379
4380
4381

4382
4383
4384
4385
4386
4387
4388
4389
....
4401
4402
4403
4404
4405
4406
4407
4408
4409
4410
4411
4412
4413
4414
4415
4416
4417
4418
4419
4420
4421
4422

4423
4424
4425
4426
4427
4428
4429
4430
4431
4432
4433
4434
4435
4436
4437
4438
4439
4440
4441
4442
4443
4444
4445
4446
4447
4448
4449
4450
....
4452
4453
4454
4455
4456
4457
4458
4459
4460
4461
4462
4463
4464
4465
4466
4467
4468
4469
4470
4471
4472
4473
4474
4475
4476
4477
4478
4479
4480
4481
4482
4483
4484
4485
4486
4487
4488
....
4489
4490
4491
4492
4493
4494
4495
4496
4497
4498
4499
4500
4501
4502
4503
4504
....
4506
4507
4508
4509
4510
4511
4512
4513
4514
4515
4516
4517
4518
4519
4520
4521
4522
....
4535
4536
4537
4538
4539
4540
4541
4542
4543
4544
4545
4546
4547
4548
4549
....
4571
4572
4573
4574
4575
4576
4577
4578
4579
4580
4581
4582
4583
4584
4585
4586

4587
4588
4589
4590
4591
4592
4593
4594
4595
4596
4597
4598
4599
4600
4601
4602
4603
4604
4605
4606
4607
4608
4609
4610
4611
4612
4613
4614
4615
4616
4617
4618
4619
4620
4621
4622
4623
4624
4625
4626
4627

4628
4629
4630
4631
4632
4633
4634
4635
4636
4637
4638
4639
4640
4641
4642
4643

4644
4645
4646
4647
4648
4649
4650
4651
....
4652
4653
4654
4655
4656
4657
4658
4659
4660
4661
4662
4663
4664
4665
4666
4667
4668
4669
4670
4671
4672
4673
4674
4675

4676
4677
4678
4679
4680
4681
4682
4683
....
4687
4688
4689
4690
4691
4692
4693
4694
4695
4696
4697
4698
4699
4700
4701
4702
4703
4704
4705
4706
4707
4708
4709
4710
4711
4712
4713
4714
4715
4716
4717
4718
4719

4720
4721
4722
4723
4724
4725
4726
4727
....
4729
4730
4731
4732
4733
4734
4735
4736
4737
4738
4739
4740
4741
4742
4743
4744
4745
4746
4747
4748
4749
4750
4751
4752
4753
4754

4755
4756
4757
4758
4759
4760
4761
4762
4763
4764
4765
4766
4767
4768
4769
4770
4771
4772

4773
4774
4775
4776
4777
4778
4779
4780
4781
4782
4783
4784
4785
4786

4787
4788
4789
4790
4791
4792
4793
4794
4795
4796
4797
4798
4799
4800
4801
4802
4803

4804
4805
4806
4807
4808
4809
4810
4811
....
4812
4813
4814
4815
4816
4817
4818
4819
4820
4821

4822
4823
4824
4825
4826
4827
4828
4829
4830
4831
4832
4833
4834
4835
4836
4837
4838
4839
4840
4841
4842
4843
4844
4845
4846
4847
4848
4849
4850
4851
4852
4853
4854
4855
4856
4857
4858
4859
4860
4861
4862
4863
4864
4865
4866
4867
4868
4869
4870
4871
4872
4873
4874
4875
4876
4877
4878
4879
4880
4881
4882
4883
4884
4885
4886
4887
4888
4889
4890
4891
4892
4893
4894
4895
4896
4897
4898
4899
4900
4901
4902
4903
4904
4905
4906
4907
4908
4909
4910
4911
4912
4913
4914
4915
4916
4917
4918
4919
4920
4921
4922
4923
4924
4925
4926
4927
4928
....
4932
4933
4934
4935
4936
4937
4938
4939
4940
4941
4942
4943
4944
4945
4946
....
4954
4955
4956
4957
4958
4959
4960
4961
4962
4963
4964
4965

4966
4967
4968
4969
4970
4971
4972
4973
....
4984
4985
4986
4987
4988
4989
4990
4991
4992
4993
4994
4995
4996
4997
4998
4999
5000
5001

5002
5003
5004
5005
5006
5007
5008
5009
5010
5011
5012
5013
5014
5015
5016
5017
5018
5019
5020
5021
5022
5023
5024
5025
5026
5027
5028
5029
5030
5031
5032
5033
5034
5035
5036
5037
5038
5039
5040
5041
5042
5043
5044
5045
5046
5047
....
5061
5062
5063
5064
5065
5066
5067
5068
5069
5070
5071
5072
5073
5074
5075
5076
5077
5078
5079
5080
5081
5082
5083
5084
5085
5086
5087
5088
5089
5090
5091
5092
5093
5094
5095
5096
5097
5098
5099
5100
5101
5102
5103
5104
5105
5106
5107
5108
5109
5110
5111
5112
5113
5114
5115
5116
5117
5118
5119
5120
5121
5122
5123
5124
5125
5126
5127
5128
5129
5130
....
5141
5142
5143
5144
5145
5146
5147
5148
5149
5150
5151

5152
5153
5154
5155
5156
5157
5158
5159
5160
5161
5162
5163
5164
5165
5166
5167
5168
5169
5170
5171
5172
5173
5174
5175
5176
5177
5178
5179
5180
5181
5182
5183
5184
5185
5186
5187
5188
5189
5190
5191
5192
....
5195
5196
5197
5198
5199
5200
5201
5202
5203
5204
5205
5206
5207
5208
5209
5210
5211
5212
5213
5214
5215
5216
5217
5218
5219
5220
5221
5222
5223
5224
5225
5226
5227
5228
5229
5230
5231
5232
5233
5234
5235
5236
5237
5238
5239
5240
5241
5242
5243
5244
5245
5246
5247
5248
5249
5250
5251
5252
5253
5254
5255
5256
5257
5258
5259
5260
5261
5262
5263
5264

5265
5266
5267
5268
5269
5270
5271
5272
....
5275
5276
5277
5278
5279
5280
5281
5282
5283
5284
5285
5286
5287
5288
5289
5290
....
5316
5317
5318
5319
5320
5321
5322
5323
5324
5325
5326
5327
5328
5329
5330
....
5333
5334
5335
5336
5337
5338
5339
5340
5341
5342
5343
5344
5345
5346
5347
5348
5349
....
5350
5351
5352
5353
5354
5355
5356
5357
5358
5359
5360
5361
5362
5363
5364
5365
5366
5367
5368
5369
5370
5371
5372

5373
5374
5375
5376
5377
5378
5379
5380
5381
5382
5383
5384
5385
5386
5387
5388
5389
5390
5391
5392
5393
5394
5395
5396
5397
5398
5399
5400
5401
5402
5403
5404
5405
5406
5407
5408
5409

5410
5411
5412
5413
5414
5415
5416
5417
5418
....
5428
5429
5430
5431
5432
5433
5434
5435
5436
5437
5438
5439
5440
5441
5442
5443
5444
5445
5446
5447
5448
5449
5450
5451
5452
5453
5454
5455
5456
5457
5458
5459
5460
5461
5462
5463
5464
5465
5466
5467
5468
5469
5470
5471

5472
5473
5474
5475
5476
5477
5478
5479
5480
5481
5482
5483
5484
5485
5486
5487
5488
5489
5490
5491
5492
5493
....
5494
5495
5496
5497
5498
5499
5500
5501
5502
5503
5504
5505
5506
5507
5508
5509
5510
5511
5512
5513
5514
5515
5516
5517
5518
5519
5520
5521
5522
5523
5524
5525
5526
5527
5528
5529
5530
5531
5532
5533
5534
5535
5536
5537
5538
5539
5540
5541
5542
5543
5544
5545
5546
5547
5548
5549
5550
5551
5552
5553
5554
5555
5556
5557
5558
5559
5560
5561
5562
5563
5564
5565
5566
5567
5568
5569
5570
5571
5572
5573
5574
5575
5576
5577
5578
5579
5580
5581
5582
5583
5584
5585
5586
5587
5588
5589
5590
5591
5592
5593
5594
5595
5596
5597
5598
5599
5600
5601
5602
5603
5604
5605
5606
5607
5608
5609
5610
5611
5612
5613
5614
5615
5616
5617
5618
5619
5620
5621
5622
5623
5624
5625
5626
5627
5628
5629
5630
5631
5632
5633
5634
5635
5636
5637
5638





























































































































5639
5640
5641
5642
5643
5644
5645
5646
5647
5648
5649
5650

5651
5652
5653

5654
5655
5656
5657
5658
5659
5660
5661
5662
5663
5664
5665
5666

5667
5668
5669
5670
5671
5672
5673
5674
5675
5676
5677
5678
5679
5680
5681
5682
5683
5684
5685
5686
5687
5688
5689
5690
5691
5692
5693
5694
5695
5696
5697
5698
5699
5700
5701
5702
5703
5704
5705
5706
5707
5708
5709
5710
5711
5712
5713
5714
5715
5716
5717
5718
5719
5720
5721
5722
5723
5724
5725
5726
5727
5728
5729
5730
5731
5732
5733
5734
5735
5736
5737
5738
5739
5740
5741
5742
5743
5744
5745
5746

5747
5748
5749
5750
5751
5752
5753
5754
5755
5756
5757
5758
5759
5760
5761
5762
5763
5764
5765
5766
5767
5768
5769
5770
5771
5772
5773
5774

5775
5776
5777
5778
5779
5780
5781
5782
5783
5784
5785
5786
5787
5788
5789
5790
5791
5792
5793
5794
5795

5796
5797
5798
5799
5800
5801
5802
5803
5804
5805
5806
5807
5808




























5809
5810
5811
5812
5813
5814
5815
5816
5817
5818
5819
5820
5821
5822
5823
5824
5825
5826
5827
5828
5829
5830
....
5875
5876
5877
5878
5879
5880
5881
5882
5883
5884
5885
5886
5887
5888
5889
5890
....
5985
5986
5987
5988
5989
5990
5991
5992
5993
5994
5995
5996
5997
5998
5999
6000
6001
6002
6003
6004
6005
6006
6007
6008
6009
6010
6011
6012
....
6018
6019
6020
6021
6022
6023
6024
6025
6026
6027
6028
6029
6030
6031
6032
6033
....
6035
6036
6037
6038
6039
6040
6041
6042
6043
6044
6045
6046
6047
6048
6049
6050
....
6066
6067
6068
6069
6070
6071
6072
6073
6074
6075
6076
6077
6078
6079
6080
6081
6082
6083
6084
6085
6086
6087
6088
6089
6090
6091
6092
6093
6094
6095
6096
6097
6098
6099
6100
6101
6102
6103
....
6122
6123
6124
6125
6126
6127
6128
6129
6130
6131
6132
6133
6134
6135
6136
6137
....
6227
6228
6229
6230
6231
6232
6233
6234
6235
6236
6237
6238
6239
6240
6241
6242
....
6256
6257
6258
6259
6260
6261
6262
6263
6264
6265
6266
6267
6268
6269
6270
6271
....
6437
6438
6439
6440
6441
6442
6443
6444
6445
6446
6447
6448
6449
6450
6451
6452
....
6474
6475
6476
6477
6478
6479
6480
6481
6482
6483
6484
6485
6486
6487
6488
6489
....
6500
6501
6502
6503
6504
6505
6506
6507
6508
6509
6510
6511
6512
6513
6514
6515
....
6540
6541
6542
6543
6544
6545
6546
6547
6548
6549
6550
6551
6552
6553
6554
6555
6556
6557
6558
6559
6560
6561
6562
....
6571
6572
6573
6574
6575
6576
6577
6578
6579
6580
6581
6582
6583
6584
6585
6586
....
6693
6694
6695
6696
6697
6698
6699
6700





6701
6702
6703
6704
6705
6706
6707
6708
6709
6710
6711
6712
6713
6714
6715
6716
6717
6718
6719
6720
6721
6722
....
6736
6737
6738
6739
6740
6741
6742
6743
6744
6745
6746
6747
6748
6749
6750
6751
....
6764
6765
6766
6767
6768
6769
6770
6771
6772
6773

6774
6775
6776
6777
6778
6779
6780
6781
....
7018
7019
7020
7021
7022
7023
7024
7025
7026
7027
7028
7029
7030
7031
7032
7033
....
7072
7073
7074
7075
7076
7077
7078
7079
7080
7081
7082
7083
7084
7085
7086
7087
7088
7089
7090
7091
7092
7093
7094
7095
7096
....
7163
7164
7165
7166
7167
7168
7169
7170
7171
7172
7173
7174
7175
7176
7177
7178
....
7185
7186
7187
7188
7189
7190
7191
7192
7193
7194
7195
7196
7197
7198
7199
7200
....
7207
7208
7209
7210
7211
7212
7213
























7214
7215
7216
7217
7218
7219
7220
7221
7222
7223
7224
7225
7226
7227
7228
7229
7230
7231
....
7493
7494
7495
7496
7497
7498
7499
7500
7501
7502
7503
7504
7505
7506
7507
7508
....
7509
7510
7511
7512
7513
7514
7515
7516
7517
7518
7519
7520
7521
7522
7523
7524
....
7525
7526
7527
7528
7529
7530
7531
7532
7533
7534
7535
7536
7537
7538
7539
7540
7541
7542
7543
7544
7545
7546
7547
7548
7549
7550
7551
7552
7553
7554
7555
7556
7557
7558
7559
7560
7561
7562
7563
....
7564
7565
7566
7567
7568
7569
7570
7571
7572

7573
7574
7575
7576
7577
7578
7579
....
7586
7587
7588
7589
7590
7591
7592
7593
7594
7595
7596
7597
7598
7599
7600
7601
....
7617
7618
7619
7620
7621
7622
7623
7624
7625
7626
7627
7628
7629
7630
7631
7632
7633
7634
7635
7636
7637
7638
7639
7640
7641
7642
7643
7644
7645
7646
7647
7648
7649
7650
7651
7652
7653
....
7665
7666
7667
7668
7669
7670
7671
7672
7673
7674
7675
7676
7677
7678
7679
7680
7681
7682
7683
7684
7685
7686
7687
7688
7689
7690
7691
7692
7693
7694
7695
7696
7697
7698
7699
7700
7701
7702
7703
7704
7705
7706
7707
7708
7709
7710
....
7724
7725
7726
7727
7728
7729
7730
7731
7732
7733
7734
7735
7736
7737
7738
7739
7740
7741
7742
7743
7744
7745
7746
7747
7748
7749
....
8968
8969
8970
8971
8972
8973
8974
8975
8976
8977
8978
8979
8980
8981
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
           "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<meta name="GENERATOR" content="TtH 4.03">
<style type="text/css">
/* Common layout styles 				*/
/* They will define the common web site appearance	*/

body {
   font-family: Verdana, Helvetica, sans-serif;
   font-size: 16px;
   background-color: white;
}


div.content, body>div{
  max-width: 1024px;
  background-color: white;
  margin-left: auto;
  margin-right: auto;
  margin-top: 8px;
  padding: 8px;
}




/* Styles for the article formating. */

table {
   border-left-width: 0px;
   border-top-width: 0px;
   border-right-width: 0px;
   border-bottom-width: 0px;

   border-style: solid;
   border-color: #606060;
   background-color: white;
   border-collapse: collapse;
   margin: auto;
}


table td {
   font-size: 16px;
   border-right-width: 1px;
   border-bottom-width: 1px;
   border-left-width: 1px;
   border-top-width: 1px;

   border-style: solid;
   border-color: #606060;
   padding-left: 4px;
   padding-right: 4px;
   padding-top: 1px;
   padding-bottom: 1px;
   margin: 0px;
}


h1 {
   color: #b9006e;
   font-size: 28px;
   text-align: center;
}


h2 {
   color: #b9006e;
  font-size: 24px;
  margin-top: 32px;
  margin-bottom: 16px;
  text-align: left;
}

h3 {
   color: #b9006e;
  font-size: 20px;
  margin-top: 32px;
  margin-bottom: 16px;
  text-align: left;
}

h4 {
   color: #b9006e;
  font-size: 16px;
  margin-top: 16px;
  margin-bottom: 12px;
  text-align: left;
}


p {
   text-align: justify;
   font-size: 16px;
   margin-top: 0.75em;
   margin-bottom: 0.75em;
}


p.uli {
  display: list-item;
  list-style: disc inside;
}


code, tt {
  font-family: "Courier New", monospace;
  font-size: 16px;
  font-weight: bold;

  padding: 0px 3px 0px 3px;
  text-indent: 0px;
}


div.code {
  padding: 1em;
  margin: 0.5em;

  font-family: "Courier New", monospace;
  font-size: 16px;
  font-weight: bold;

  overflow: auto;
  max-width: 150%;
  white-space: pre;

  background-color: white;
  border: 2px solid gray;

  clear: left;
}


img.txt {
  background-color:white;
  margin-left:auto;
  margin-right: auto;
  display: block;
  clear: both;
  max-width: 100%;
}

img.txt1 {
  background-color:white;
  clear: left;
  float: left;
  display: block;
  margin: 1em;
}


img.txt2 {
  background-color:white;
  clear: right;
  float: right;
  display: block;
  margin: 1em;
}

div.bq {
  border: 0px solid #808080;
  padding: 1em;
  padding-left: 2em;
  margin: 0.5em;

  background-color: #f8f8f8;
  opacity: 0.7;
  filter:alpha(opacity=70);

  -webkit-box-shadow: 3px 3px 5px #808080;
  -moz-box-shadow: 3px 3px 5px #808080;
  box-shadow: 3px 3px 5px #808080;

  clear: both;
}

/* Article anchors content */
a.a:before {
 text-decoration: none;
 display: inline-block;
 overflow: hidden;
 content: url(_images/anchor.gif);
 vertical-align: middle;
}


hr {
 margin-top: 1em;
 margin-bottom: 1em;
}


div.main_menu {
  background-color: #f05a28;

  max-width: 1024px;
  margin-left: auto;
  margin-right: auto;
  margin-top: 8px;

  padding: 8px;
}

div.main_menu a {
  color: white;
  padding: 5px;
  display: inline-block;
}

div.main_menu a:hover {
  color: #ffe600;
}

</style>

 <style type="text/css"> div.p { margin-top: 7pt;}</style>
 <style type="text/css"><!--
 td div.comp { margin-top: -0.6ex; margin-bottom: -1ex;}
 td div.comb { margin-top: -0.6ex; margin-bottom: -.6ex;}
 td div.hrcomp { line-height: 0.9; margin-top: -0.8ex; margin-bottom: -1ex;}
 td div.norm {line-height:normal;}
 span.roman {font-family: serif; font-style: normal; font-weight: normal;} 
 span.overacc2 {position: relative;  left: .8em; top: -1.2ex;}
 span.overacc1 {position: relative;  left: .6em; top: -1.2ex;} --></style>
 <style type="text/css"><!--
 .tiny {font-size:30%;}
 .scriptsize {font-size:xx-small;}
 .footnotesize {font-size:x-small;}
 .smaller {font-size:smaller;}
 .small {font-size:small;}
 .normalsize {font-size:medium;}
 .large {font-size:large;}
 .larger {font-size:x-large;}
 .largerstill {font-size:xx-large;}
 .huge {font-size:300%;}
 --></style>


<title>flat assembler 1.71</title>
</head>
<body><div>
<div class="p"><!----></div>

<h3 align="center">Tomasz Grysztar </h3>

<h1 align="center">flat assembler 1.71<br /><span class="small">Programmer's Manual</span> </h1>

<h3 align="center"> </h3>


<div class="p"><!----></div>
 <a id="tth_chAp1"></a><h1>
Chapter 1 <br />Introduction</h1>
................................................................................

<h4>Movement:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table>
<tr><td width="158">Left arrow       </td><td align="left">move one character left </td></tr>
<tr><td width="158">Right arrow      </td><td align="left">move one character right </td></tr>
<tr><td width="158">Up arrow         </td><td align="left">move one line up </td></tr>
<tr><td width="158">Down arrow       </td><td align="left">move one line down </td></tr>
<tr><td width="158">Ctrl+Left arrow  </td><td align="left">move one word left </td></tr>
<tr><td width="158">Ctrl+Right arrow </td><td align="left">move one word right </td></tr>
<tr><td width="158">Home             </td><td align="left">move to the beginning of line </td></tr>
................................................................................

<h4>Editing:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table>
<tr><td width="158">Insert         </td><td align="left">switch insert/overwrite mode </td></tr>
<tr><td width="158">Alt+Insert     </td><td align="left">switch horizontal/vertical blocks </td></tr>
<tr><td width="158">Delete         </td><td align="left">delete current character </td></tr>
<tr><td width="158">Backspace      </td><td align="left">delete previous character </td></tr>
<tr><td width="158">Ctrl+Backspace </td><td align="left">delete previous word </td></tr>
<tr><td width="158">Alt+Backspace  </td><td align="left">undo previous operation (also Ctrl+Z) </td></tr>

<tr><td width="158">Ctrl+Y         </td><td align="left">delete current line </td></tr>
<tr><td width="158">F6             </td><td align="left">duplicate current line </td></tr>
<tr><td width="158"></td></tr></table>
</div>
<div class="p"><!----></div>

<h4>Block operations:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table>
<tr><td width="158">Ctrl+Insert  </td><td align="left">copy block into clipboard (also Ctrl+C) </td></tr>
<tr><td width="158">Shift+Insert </td><td align="left">paste block from the clipboard (also Ctrl+V) </td></tr>
<tr><td width="158">Ctrl+Delete  </td><td align="left">delete block </td></tr>
<tr><td width="158">Shift+Delete </td><td align="left">cut block into clipboard (also Ctrl+X) </td></tr>
<tr><td width="158">Ctrl+A       </td><td align="left">select all text </td></tr>
<tr><td width="158"></td></tr></table>
</div>
................................................................................

<h4>Search:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table>
<tr><td width="158">F5       </td><td align="left">go to specified position (also Ctrl+G) </td></tr>
<tr><td width="158">F7       </td><td align="left">find (also Ctrl+F) </td></tr>
<tr><td width="158">Shift+F7 </td><td align="left">find next (also F3) </td></tr>
<tr><td width="158">Ctrl+F7  </td><td align="left">replace (also Ctrl+H) </td></tr>
<tr><td width="158"></td></tr></table>
</div>
<div class="p"><!----></div>
................................................................................

<h4>Compile:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table>
<tr><td width="158">F9       </td><td align="left">compile and run </td></tr>
<tr><td width="158">Ctrl+F9  </td><td align="left">compile only </td></tr>
<tr><td width="158">Shift+F9 </td><td align="left">assign current file as main file to compile </td></tr>
<tr><td width="158">Ctrl+F8  </td><td align="left">compile and build symbols information </td></tr>
<tr><td width="158"></td></tr></table>
</div>
<div class="p"><!----></div>
................................................................................

<h4>Other keys:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table>
<tr><td width="158">F2              </td><td align="left">save current file </td></tr>
<tr><td width="158">Shift+F2        </td><td align="left">save file under a new name </td></tr>
<tr><td width="158">F4              </td><td align="left">load file </td></tr>
<tr><td width="158">Ctrl+N          </td><td align="left">create new file </td></tr>
<tr><td width="158">Ctrl+Tab        </td><td align="left">switch to next file </td></tr>
<tr><td width="158">Ctrl+Shift+Tab  </td><td align="left">switch to previous file </td></tr>
<tr><td width="158">Alt+[1-9]       </td><td align="left">switch to file of given number </td></tr>
................................................................................
<tr><td width="158">Esc             </td><td align="left">close current file </td></tr>
<tr><td width="158">Alt+X           </td><td align="left">close all files and exit </td></tr>
<tr><td width="158">Ctrl+F6         </td><td align="left">calculator </td></tr>
<tr><td width="158">Alt+Left arrow  </td><td align="left">scroll left </td></tr>
<tr><td width="158">Alt+Right arrow </td><td align="left">scroll right </td></tr>
<tr><td width="158">Alt+Up arrow    </td><td align="left">scroll up </td></tr>
<tr><td width="158">Alt+Down arrow  </td><td align="left">scroll down </td></tr>
<tr><td width="158">Alt+Delete      </td><td align="left">discard undo information </td></tr>
<tr><td width="158"></td></tr></table>
</div>
<div class="p"><!----></div>

<h4>Specific keys:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table>
<tr><td width="158">F1     </td><td align="left">search for keyword in selected help file </td></tr>
<tr><td width="158">Alt+F1 </td><td align="left">contents of selected help file </td></tr>
<tr><td width="158"></td></tr></table>
</div>
<div class="p"><!----></div>
     <a id="tth_sEc1.1.4"></a><h3>
1.1.4&nbsp;&nbsp;Editor options</h3>
................................................................................
editor the so-called
dead keys (keys that don't immediately generate the character, but wait for a next key
to decide what character to put - usually you enter the character of a dead key by
pressing a space key after it). It may be useful if key for entering some of the characters that
you need to enter often into assembly source is a dead key and you don't need this
functionality for writing programs.





<div class="p"><!----></div>
     <a id="tth_sEc1.1.5"></a><h3>
1.1.5&nbsp;&nbsp;Executing compiler from command line</h3>
To perform compilation from the command line you need to execute
the <tt>fasm.exe</tt> executable, providing two parameters - first
should be name of source file, second should be name of
destination file. If no second parameter is given, the name for
................................................................................
As it is stated above, after the successful compilation, the
compiler displays the compilation summary. It includes the
information of how many passes was done, how much time it took,
and how many bytes were written into the destination file. The
following is an example of the compilation summary:

<pre>
flat&nbsp;assembler&nbsp;&nbsp;version&nbsp;1.70&nbsp;(16384&nbsp;kilobytes&nbsp;memory)
38&nbsp;passes,&nbsp;5.3&nbsp;seconds,&nbsp;77824&nbsp;bytes.

</pre>
In case of error during the compilation process, the program will
display an error message. For example, when compiler can't find
the input file, it will display the following message:

<pre>
flat&nbsp;assembler&nbsp;&nbsp;version&nbsp;1.70&nbsp;(16384&nbsp;kilobytes&nbsp;memory)
error:&nbsp;source&nbsp;file&nbsp;not&nbsp;found.

</pre>
If the error is connected with a specific part of source code, the
source line that caused the error will be also displayed. Also
placement of this line in the source is given to help you finding
this error, for example:

<pre>
flat&nbsp;assembler&nbsp;&nbsp;version&nbsp;1.70&nbsp;(16384&nbsp;kilobytes&nbsp;memory)
example.asm&nbsp;[3]:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mob&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ax,1
error:&nbsp;illegal&nbsp;instruction.

</pre>
It means that in the third line of the <tt>example.asm</tt> file
compiler has encountered an unrecognized instruction. When the
line that caused error contains a macroinstruction, also the line
in macroinstruction definition that generated the erroneous
instruction is displayed:

<pre>
flat&nbsp;assembler&nbsp;&nbsp;version&nbsp;1.70&nbsp;(16384&nbsp;kilobytes&nbsp;memory)
example.asm&nbsp;[6]:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;stoschar&nbsp;7
example.asm&nbsp;[3]&nbsp;stoschar&nbsp;[1]:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mob&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;al,char
error:&nbsp;illegal&nbsp;instruction.

</pre>
................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">
<table border="1">
<tr><td align="center">Operator </td><td align="center">Bits </td><td align="center">Bytes </td></tr><tr><td></td></tr>
<tr><td align="center"><tt>byte</tt> </td><td align="center">8 </td><td align="center">1 </td></tr>
<tr><td align="center"><tt>word</tt> </td><td align="center">16 </td><td align="center">2 </td></tr>
<tr><td align="center"><tt>dword</tt> </td><td align="center">32 </td><td align="center">4 </td></tr>
<tr><td align="center"><tt>fword</tt> </td><td align="center">48 </td><td align="center">6 </td></tr>
<tr><td align="center"><tt>pword</tt> </td><td align="center">48 </td><td align="center">6 </td></tr>
<tr><td align="center"><tt>qword</tt> </td><td align="center">64 </td><td align="center">8 </td></tr>
................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb1.2">
</a> 
<div style="text-align:center">
<table border="1">
<tr><td align="center">Type </td><td align="center">Bits </td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td></tr><tr><td></td></tr>
<tr><td align="center"></td><td align="center">8 </td><td align="center"><tt>al</tt> </td><td align="center"><tt>cl</tt> </td><td align="center"><tt>dl</tt> </td><td align="center"><tt>bl</tt> </td><td align="center"><tt>ah</tt> </td><td align="center"><tt>ch</tt> </td><td align="center"><tt>dh</tt> </td><td align="center"><tt>bh</tt> </td></tr>
<tr><td align="center">General </td><td align="center">16 </td><td align="center"><tt>ax</tt> </td><td align="center"><tt>cx</tt> </td><td align="center"><tt>dx</tt> </td><td align="center"><tt>bx</tt> </td><td align="center"><tt>sp</tt> </td><td align="center"><tt>bp</tt> </td><td align="center"><tt>si</tt> </td><td align="center"><tt>di</tt> </td></tr>
<tr><td align="center"></td><td align="center">32 </td><td align="center"><tt>eax</tt> </td><td align="center"><tt>ecx</tt> </td><td align="center"><tt>edx</tt> </td><td align="center"><tt>ebx</tt> </td><td align="center"><tt>esp</tt> </td><td align="center"><tt>ebp</tt> </td><td align="center"><tt>esi</tt> </td><td align="center"><tt>edi</tt> </td></tr>
<tr><td align="center">Segment </td><td align="center">16 </td><td align="center"><tt>es</tt> </td><td align="center"><tt>cs</tt> </td><td align="center"><tt>ss</tt> </td><td align="center"><tt>ds</tt> </td><td align="center"><tt>fs</tt> </td><td align="center"><tt>gs</tt> </td><td align="center"></td><td align="center"></td></tr>
<tr><td align="center">Control </td><td align="center">32 </td><td align="center"><tt>cr0</tt> </td><td align="center"></td><td align="center"><tt>cr2</tt> </td><td align="center"><tt>cr3</tt> </td><td align="center"><tt>cr4</tt> </td><td align="center"></td><td align="center"></td><td align="center"></td></tr>
<tr><td align="center">Debug </td><td align="center">32 </td><td align="center"><tt>dr0</tt> </td><td align="center"><tt>dr1</tt> </td><td align="center"><tt>dr2</tt> </td><td align="center"><tt>dr3</tt> </td><td align="center"></td><td align="center"></td><td align="center"><tt>dr6</tt> </td><td align="center"><tt>dr7</tt> </td></tr>
<tr><td align="center">FPU </td><td align="center">80 </td><td align="center"><tt>st0</tt> </td><td align="center"><tt>st1</tt> </td><td align="center"><tt>st2</tt> </td><td align="center"><tt>st3</tt> </td><td align="center"><tt>st4</tt> </td><td align="center"><tt>st5</tt> </td><td align="center"><tt>st6</tt> </td><td align="center"><tt>st7</tt> </td></tr>
<tr><td align="center">MMX </td><td align="center">64 </td><td align="center"><tt>mm0</tt> </td><td align="center"><tt>mm1</tt> </td><td align="center"><tt>mm2</tt> </td><td align="center"><tt>mm3</tt> </td><td align="center"><tt>mm4</tt> </td><td align="center"><tt>mm5</tt> </td><td align="center"><tt>mm6</tt> </td><td align="center"><tt>mm7</tt> </td></tr>
<tr><td align="center">SSE </td><td align="center">128 </td><td align="center"><tt>xmm0</tt> </td><td align="center"><tt>xmm1</tt> </td><td align="center"><tt>xmm2</tt> </td><td align="center"><tt>xmm3</tt> </td><td align="center"><tt>xmm4</tt> </td><td align="center"><tt>xmm5</tt> </td><td align="center"><tt>xmm6</tt> </td><td align="center"><tt>xmm7</tt> </td></tr>
<tr><td align="center">AVX </td><td align="center">256 </td><td align="center"><tt>ymm0</tt> </td><td align="center"><tt>ymm1</tt> </td><td align="center"><tt>ymm2</tt> </td><td align="center"><tt>ymm3</tt> </td><td align="center"><tt>ymm4</tt> </td><td align="center"><tt>ymm5</tt> </td><td align="center"><tt>ymm6</tt> </td><td align="center"><tt>ymm7</tt> </td></tr></table>



</div>

<div style="text-align:center">Table 1.2: Registers.</div>
<a id="tab:registers">
</a>

<div class="p"><!----></div>
................................................................................
addressing, segment register name followed with a colon should be put just
before the address value (inside the square brackets or after the <tt>ptr</tt>
operator).

<div class="p"><!----></div>
     <a id="tth_sEc1.2.2"></a><h3>
1.2.2&nbsp;&nbsp;Data definitions</h3>
<a 
id="DB121"></a><a 
id="RB122"></a><a 
id="DW123"></a><a 
id="DU124"></a><a 
id="RW125"></a><a 
id="DP126"></a><a 
id="RP127"></a><a 
id="DF128"></a><a 
id="RF129"></a>

<a 
id="DD1210"></a><a 
id="RD1211"></a><a 
id="DQ1212"></a><a 
id="RQ1213"></a><a 
id="DT1214"></a><a 
id="RT1215"></a>

To define data or reserve a space for it, use one of the directives listed
in table . The data definition directive should be
followed by one or more of numerical expressions, separated with commas.
These expressions define the values for data cells of size depending on which
directive is used. For example <tt>db&nbsp;1,2,3</tt> will define the three bytes of
values 1, 2 and 3 respectively.

................................................................................
make multiple copies of given values. The count of duplicates should precede
this operator and the value to duplicate should follow - it can even be the
chain of values separated with commas, but such set of values needs to be
enclosed with parenthesis, like <tt>db&nbsp;5&nbsp;dup&nbsp;(1,2)</tt>, which defines five copies
of the given two byte sequence.

<div class="p"><!----></div>
<a 
id="FILE1216"></a>The <tt>file</tt> is a special directive and its syntax is different. This
directive includes a chain of bytes from file and it should be followed by
the quoted file name, then optionally numerical expression specifying offset
in file preceded by the colon, then - also optionally - comma and numerical
expression specifying count of bytes to include (if no count is specified,
all data up to the end of file is included). For example <tt>file&nbsp;'data.bin'</tt> will
include the whole file as binary data and <tt>file&nbsp;'data.bin':10h,4</tt> will include
only four bytes starting at offset 10h.
................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb1.3">
</a> 
<div style="text-align:center">
<table border="1">
<tr><td align="center">Size </td><td align="center">Define </td><td align="center">Reserve </td></tr>
<tr><td align="center">(bytes) </td><td align="center">data </td><td align="center">data </td></tr><tr><td></td></tr>
<tr><td align="center">1 </td><td align="center"><tt>db</tt> </td><td align="center"><tt>rb</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>file</tt> </td><td align="center"></td></tr>
<tr><td align="center">2 </td><td align="center"><tt>dw</tt> </td><td align="center"><tt>rw</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>du</tt> </td><td align="center"></td></tr>
<tr><td align="center">4 </td><td align="center"><tt>dd</tt> </td><td align="center"><tt>rd</tt> </td></tr>
................................................................................
any place of source (even before it was defined). Constant can be redefined
many times, but in this case it is accessible only after it was defined, and
is always equal to the value from last definition before the place where it's
used. When a constant is defined only once in source, it is - like the label -
accessible from anywhere.

<div class="p"><!----></div>
<a 
id="_1217"></a>The definition of constant consists of name of the constant followed by the
<tt>=</tt> character and numerical expression, which after calculation will
become the value of constant. This value is always calculated at the time the
constant is defined. For example you can define <tt>count</tt> constant by
using the directive <tt>count&nbsp;=&nbsp;17</tt>, and then use it in the assembly
instructions, like <tt>mov&nbsp;cx,count</tt> - which will become <tt>mov&nbsp;cx,17</tt>
during the compilation process.

................................................................................
compares the sizes of operands, which should be equal. You can force
assembling that instruction by using size override:
<tt>mov&nbsp;ax,word&nbsp;[char]</tt>, but remember that this instruction will read the
two bytes beginning at <tt>char</tt> address, while it was defined as a one
byte.

<div class="p"><!----></div>
<a 
id="LABEL1218"></a>The last and the most flexible way to define labels is to use <tt>label</tt>
directive. This directive should be followed by the name of label, then
optionally size operator and then - also optionally <tt>at</tt> operator and
the numerical expression defining the address at which this label should be
defined. For example <tt>label&nbsp;wchar&nbsp;word&nbsp;at&nbsp;char</tt> will define a new label
for the 16-bit data at the address of <tt>char</tt>. Now the instruction
<tt>mov&nbsp;ax,[wchar]</tt> will be after compilation the same as
<tt>mov&nbsp;ax,word&nbsp;[char]</tt>. If no address is specified, <tt>label</tt> directive
................................................................................
constants or labels. But they can be more complex, by using the arithmetical
or logical operators for calculations at compile time. All these operators
with their priority values are listed in table .
The operations with higher priority value will be calculated first, you can
of course change this behavior by putting some parts of expression into
parenthesis. The <tt>+</tt>, <tt>-</tt>, <tt>*</tt> and <tt>/</tt> are standard
arithmetical operations, <tt>mod</tt> calculates the remainder from division.
The <tt>and</tt>, <tt>or</tt>, <tt>xor</tt>, <tt>shl</tt>, <tt>shr</tt> and <tt>not</tt>
perform the same logical operations as assembly instructions of those names.
The <tt>rva</tt> and <tt>plt</tt> are special unary operators that perform
conversions between different kinds of addresses, they can be used only with
few of the output formats and their meaning may vary (see ).

<div class="p"><!----></div>
The arithmetical and logical calculations are usually processed as if they
operated on infinite precision 2-adic numbers, and assembler signalizes an
overflow error if because of its limitations it is not table to perform the
required calculation, or if the result is too large number to fit in either
signed or unsigned range for the destination unit size. However <tt>not</tt>, <tt>xor</tt>
and <tt>shr</tt> operators are exceptions from this rule - if the value specified
by numerical expression has to fit in a unit of specified size, and the
arguments for operation fit into that size, the operation will be performed
with precision limited to that size.

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb1.4">
</a> 
<div style="text-align:center">
<table border="1">
<tr><td align="center">Priority </td><td align="center">Operators </td></tr><tr><td></td></tr>
<tr><td align="center">0 </td><td align="center"><tt>+</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>-</tt> </td></tr>
<tr><td align="center">1 </td><td align="center"><tt>*</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>/</tt> </td></tr>
<tr><td align="center">2 </td><td align="center"><tt>mod</tt> </td></tr>
<tr><td align="center">3 </td><td align="center"><tt>and</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>or</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>xor</tt> </td></tr>
<tr><td align="center">4 </td><td align="center"><tt>shl</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>shr</tt> </td></tr>
<tr><td align="center">5 </td><td align="center"><tt>not</tt> </td></tr>
<tr><td align="center">6 </td><td align="center"><tt>rva</tt> </td></tr>


<tr><td align="center"></td><td align="center"><tt>plt</tt> </td></tr></table>
</div>

<div style="text-align:center">Table 1.4: Arithmetical and logical operators by priority.</div>
<a id="tab:operators_priority">
</a>

<div class="p"><!----></div>
The numbers in the expression are by default treated as a decimal, binary
numbers should have the <tt>b</tt> letter attached at the end, octal number
should end with <tt>o</tt> letter, hexadecimal numbers should begin with <tt>0x</tt> characters
................................................................................
of the segment register is also a mnemonic of instruction prefix, altough it
is recommended to use segment overrides inside the square brackets instead of
these prefixes.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.1"></a><h3>
2.1.1&nbsp;&nbsp;Data movement instructions</h3>
<a 
id="mov2119"></a>
<tt>mov</tt> transfers a byte, word or double word from the source operand to
the destination operand. It can transfer data between general registers, from
the general register to memory, or from memory to general register, but it
cannot move from memory to memory. It can also transfer an immediate value to
general register or memory, segment register to general register or memory,
general register or memory to segment register, control or debug register to
general register and general register to control or debug register. The
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;mov&nbsp;ds,[bx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;memory&nbsp;to&nbsp;segment&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;mov&nbsp;eax,cr0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;control&nbsp;register&nbsp;to&nbsp;general&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;mov&nbsp;cr3,ebx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;general&nbsp;register&nbsp;to&nbsp;control&nbsp;register

</pre>

<div class="p"><!----></div>
<a 
id="xchg2120"></a><tt>xchg</tt> swaps the contents of two operands. It can swap two byte
operands, two word operands or two double word operands. Order of operands is
not important. The operands may be two general registers, or general register
with memory. For example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;xchg&nbsp;ax,bx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;swap&nbsp;two&nbsp;general&nbsp;registers
&nbsp;&nbsp;&nbsp;&nbsp;xchg&nbsp;al,[char]&nbsp;&nbsp;;&nbsp;swap&nbsp;register&nbsp;with&nbsp;memory

</pre>

<div class="p"><!----></div>
<a 
id="push2121"></a><a 
id="pushw2122"></a><a 
id="pushd2123"></a><tt>push</tt> decrements the stack frame pointer (<tt>esp</tt> register), then
transfers the operand to the top of stack indicated by <tt>esp</tt>. The
operand can be memory, general register, segment register or immediate value
of word or double word size. If operand is an immediate value and no size is
specified, it is by default treated as a word value if assembler is in
16-bit mode and as a double word value if assembler is in 32-bit mode.
<tt>pushw</tt> and <tt>pushd</tt> mnemonics are variants of this instruction that
store the values of word or double word size respectively. If more operands
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;push&nbsp;es&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;store&nbsp;segment&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;pushw&nbsp;[bx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;store&nbsp;memory
&nbsp;&nbsp;&nbsp;&nbsp;push&nbsp;1000h&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;store&nbsp;immediate&nbsp;value

</pre>

<div class="p"><!----></div>
<a 
id="pusha2124"></a><a 
id="pushaw2125"></a><a 
id="pushad2126"></a><tt>pusha</tt> saves the contents of the eight general register on the stack.
This instruction has no operands. There are two version of this instruction,
one 16-bit and one 32-bit, assembler automatically generates the right
version for current mode, but it can be overridden by using <tt>pushaw</tt>
or <tt>pushad</tt> mnemonic to always get the 16-bit or 32-bit version.
The 16-bit version of this instruction pushes general registers on the stack
in the following order: <tt>ax</tt>, <tt>cx</tt>, <tt>dx</tt>, <tt>bx</tt>, the
initial value of <tt>sp</tt> before <tt>ax</tt> was pushed, <tt>bp</tt>, <tt>si</tt>
and <tt>di</tt>. The 32-bit version pushes equivalent 32-bit general
registers in the same order.

<div class="p"><!----></div>
<a 
id="pop2127"></a><a 
id="popw2128"></a><a 
id="popd2129"></a><tt>pop</tt> transfers the word or double word at the current top of stack to
the destination operand, and then increments <tt>esp</tt> to point to the new
top of stack. The operand can be memory, general register or segment
register. <tt>popw</tt> and <tt>popd</tt> mnemonics are variants of this
instruction for restoring the values of word or double word size respectively.
If more operands separated with spaces follow in the same line, compiler will
assemble chain of the <tt>pop</tt> instructions with these operands.

................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;pop&nbsp;bx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;restore&nbsp;general&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;pop&nbsp;ds&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;restore&nbsp;segment&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;popw&nbsp;[si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;restore&nbsp;memory

</pre>

<div class="p"><!----></div>
<a 
id="popa2130"></a><a 
id="popaw2131"></a><a 
id="popad2132"></a><tt>popa</tt> restores the registers saved on the stack by <tt>pusha</tt>
instruction, except for the saved value of <tt>sp</tt> (or <tt>esp</tt>),
which is ignored. This instruction has no operands. To force assembling
16-bit or 32-bit version of this instruction use <tt>popaw</tt> or
<tt>popad</tt> mnemonic.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.2"></a><h3>
................................................................................
The type conversion instructions convert bytes into words, words into double
words, and double words into quad words. These conversions can be done using
the sign extension or zero extension. The sign extension fills the extra bits
of the larger item with the value of the sign bit of the smaller item,
the zero extension simply fills them with zeros.

<div class="p"><!----></div>
<a 
id="cwd2133"></a><a 
id="cdq2134"></a><tt>cwd</tt> and <tt>cdq</tt> double the size of value <tt>ax</tt> or <tt>eax</tt>
register respectively and store the extra bits into the <tt>dx</tt> or
<tt>edx</tt> register. The conversion is done using the sign extension.
These instructions have no operands.

<div class="p"><!----></div>
<a 
id="cbw2135"></a><a 
id="cwde2136"></a><tt>cbw</tt> extends the sign of the byte in <tt>al</tt> throughout <tt>ax</tt>,
and <tt>cwde</tt> extends the sign of the word in <tt>ax</tt> throughout
<tt>eax</tt>. These instructions also have no operands.

<div class="p"><!----></div>
<a 
id="movsx2137"></a><a 
id="movzx2138"></a><tt>movsx</tt> converts a byte to word or double word and a word to double word
using the sign extension. <tt>movzx</tt> does the same, but it uses the zero
extension. The source operand can be general register or memory, while the
destination operand must be a general register. For example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;movsx&nbsp;ax,al&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;byte&nbsp;register&nbsp;to&nbsp;word&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;movsx&nbsp;edx,dl&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;byte&nbsp;register&nbsp;to&nbsp;double&nbsp;word&nbsp;register
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;movsx&nbsp;eax,word&nbsp;[bx]&nbsp;;&nbsp;word&nbsp;memory&nbsp;to&nbsp;double&nbsp;word&nbsp;register

</pre>

<div class="p"><!----></div>
     <a id="tth_sEc2.1.3"></a><h3>
2.1.3&nbsp;&nbsp;Binary arithmetic instructions</h3>
<a 
id="add2139"></a>
<tt>add</tt> replaces the destination operand with the sum of the source and
destination operands and sets CF if overflow has occurred. The operands may
be bytes, words or double words. The destination operand can be general
register or memory, the source operand can be general register or immediate
value, it can also be memory if the destination operand is register.

<pre>
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;add&nbsp;[di],al&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;register&nbsp;to&nbsp;memory
&nbsp;&nbsp;&nbsp;&nbsp;add&nbsp;al,48&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;immediate&nbsp;value&nbsp;to&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;add&nbsp;[char],48&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;immediate&nbsp;value&nbsp;to&nbsp;memory

</pre>

<div class="p"><!----></div>
<a 
id="adc2140"></a><tt>adc</tt> sums the operands, adds one if CF is set, and replaces the
destination operand with the result. Rules for the operands are the same as
for the <tt>add</tt> instruction. An <tt>add</tt> followed by multiple <tt>adc</tt>
instructions can be used to add numbers longer than 32 bits.

<div class="p"><!----></div>
<a 
id="inc2141"></a><tt>inc</tt> adds one to the operand, it does not affect CF. The operand can be
a general register or memory, and the size of the operand can be byte, word or double word.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;inc&nbsp;ax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;increment&nbsp;register&nbsp;by&nbsp;one
&nbsp;&nbsp;&nbsp;&nbsp;inc&nbsp;byte&nbsp;[bx]&nbsp;&nbsp;&nbsp;;&nbsp;increment&nbsp;memory&nbsp;by&nbsp;one

</pre>

<div class="p"><!----></div>
<a 
id="sub2142"></a><tt>sub</tt> subtracts the source operand from the destination operand and
replaces the destination operand with the result. If a borrow is required,
the CF is set. Rules for the operands are the same as for the <tt>add</tt>
instruction.

<div class="p"><!----></div>
<a 
id="sbb2143"></a><tt>sbb</tt> subtracts the source operand from the destination operand,
subtracts one if CF is set, and stores the result to the destination operand.
Rules for the operands are the same as for the <tt>add</tt> instruction.
A <tt>sub</tt> followed by multiple <tt>sbb</tt> instructions may be used to
subtract numbers longer than 32 bits.

<div class="p"><!----></div>
<a 
id="dec2144"></a><tt>dec</tt> subtracts one from the operand, it does not affect CF. Rules for
the operand are the same as for the <tt>inc</tt> instruction.

<div class="p"><!----></div>
<a 
id="cmp2145"></a><tt>cmp</tt> subtracts the source operand from the destination operand. It
updates the flags as the <tt>sub</tt> instruction, but does not alter the
source and destination operands. Rules for the operands are the same as for
the <tt>sub</tt> instruction.

<div class="p"><!----></div>
<a 
id="neg2146"></a><tt>neg</tt> subtracts a signed integer operand from zero. The effect of this
instructon is to reverse the sign of the operand from positive to negative or
from negative to positive. Rules for the operand are the same as for the
<tt>inc</tt> instruction.

<div class="p"><!----></div>
<a 
id="xadd2147"></a><tt>xadd</tt> exchanges the destination operand with the source operand,
then loads the sum of the two values into the destination operand. Rules for
the operands are the same as for the <tt>add</tt> instruction.

<div class="p"><!----></div>
All the above binary arithmetic instructions update SF, ZF, PF and OF flags.
SF is always set to the same value as the result's sign bit, ZF is set
when all the bits of result are zero, PF is set when low order eight bits of
result contain an even number of set bits, OF is set if result is too large for a
positive number or too small for a negative number (excluding sign bit) to fit in
destination operand.

<div class="p"><!----></div>
<a 
id="mul2148"></a><tt>mul</tt> performs an unsigned multiplication of the operand and the
accumulator. If the operand is a byte, the processor multiplies it by the
contents of <tt>al</tt> and returns the 16-bit result to <tt>ah</tt> and
<tt>al</tt>. If the operand is a word, the processor multiplies it by the
contents of <tt>ax</tt> and returns the 32-bit result to <tt>dx</tt> and
<tt>ax</tt>. If the operand is a double word, the processor multiplies it by
the contents of <tt>eax</tt> and returns the 64-bit result in <tt>edx</tt> and
<tt>eax</tt>. <tt>mul</tt> sets CF and OF when the upper half of the result is
nonzero, otherwise they are cleared. Rules for the operand are the same as
for the <tt>inc</tt> instruction.

<div class="p"><!----></div>
<a 
id="imul2149"></a><tt>imul</tt> performs a signed multiplication operation. This
instruction has three variations. First has one operand and
behaves in the same way as the <tt>mul</tt> instruction. Second has
two operands, in this case destination operand is multiplied by
the source operand and the result replaces the destination
operand. Destination operand must be a general register, it can be
word or double word, source operand can be general register,
memory or immediate value. Third form has three operands, the
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;imul&nbsp;bx,10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;register&nbsp;by&nbsp;immediate&nbsp;value
&nbsp;&nbsp;&nbsp;&nbsp;imul&nbsp;ax,bx,10&nbsp;&nbsp;&nbsp;;&nbsp;register&nbsp;by&nbsp;immediate&nbsp;value&nbsp;to&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;imul&nbsp;ax,[si],10&nbsp;;&nbsp;memory&nbsp;by&nbsp;immediate&nbsp;value&nbsp;to&nbsp;register

</pre>

<div class="p"><!----></div>
<a 
id="div2150"></a><tt>div</tt> performs an unsigned division of the accumulator by the operand.
The dividend (the accumulator) is twice the size of the divisor (the
operand), the quotient and remainder have the same size as the divisor.
If divisor is byte, the dividend is taken from <tt>ax</tt> register, the
quotient is stored in <tt>al</tt> and the remainder is stored in <tt>ah</tt>.
If divisor is word, the upper half of dividend is taken from <tt>dx</tt>,
the lower half of dividend is taken from <tt>ax</tt>, the quotient is stored
in <tt>ax</tt> and the remainder is stored in <tt>dx</tt>. If divisor is double
word, the upper half of dividend is taken from <tt>edx</tt>, the lower half of
dividend is taken from <tt>eax</tt>, the quotient is stored in <tt>eax</tt> and
the remainder is stored in <tt>edx</tt>. Rules for the operand are the same as
for the <tt>mul</tt> instruction.

<div class="p"><!----></div>
<a 
id="idiv2151"></a><tt>idiv</tt> performs a signed division of the accumulator by the operand.
It uses the same registers as the <tt>div</tt> instruction, and the rules for
the operand are the same.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.4"></a><h3>
2.1.4&nbsp;&nbsp;Decimal arithmetic instructions</h3>
Decimal arithmetic is performed by combining the binary arithmetic
................................................................................
arithmetic instructions. The decimal arithmetic instructions are used to
adjust the results of a previous binary arithmetic operation to produce a
valid packed or unpacked decimal result, or to adjust the inputs to a
subsequent binary arithmetic operation so the operation will produce a valid
packed or unpacked decimal result.

<div class="p"><!----></div>
<a 
id="daa2152"></a><tt>daa</tt> adjusts the result of adding two valid packed decimal operands in
<tt>al</tt>. <tt>daa</tt> must always follow the addition of two pairs of packed
decimal numbers (one digit in each half-byte) to obtain a pair of valid
packed decimal digits as results. The carry flag is set if carry was needed.
This instruction has no operands.

<div class="p"><!----></div>
<a 
id="das2153"></a><tt>das</tt> adjusts the result of subtracting two valid packed decimal
operands in <tt>al</tt>. <tt>das</tt> must always follow the subtraction of one
pair of packed decimal numbers (one digit in each half-byte) from another
to obtain a pair of valid packed decimal digits as results. The carry flag is
set if a borrow was needed. This instruction has no operands.

<div class="p"><!----></div>
<a 
id="aaa2154"></a><tt>aaa</tt> changes the contents of register <tt>al</tt> to a valid unpacked
decimal number, and zeroes the top four bits. <tt>aaa</tt> must always follow
the addition of two unpacked decimal operands in <tt>al</tt>. The carry flag is
set and <tt>ah</tt> is incremented if a carry is necessary. This instruction
has no operands.

<div class="p"><!----></div>
<a 
id="aas2155"></a><tt>aas</tt> changes the contents of register <tt>al</tt> to a valid unpacked
decimal number, and zeroes the top four bits. <tt>aas</tt> must always follow
the subtraction of one unpacked decimal operand from another in <tt>al</tt>.
The carry flag is set and <tt>ah</tt> decremented if a borrow is necessary.
This instruction has no operands.

<div class="p"><!----></div>
<a 
id="aam2156"></a><tt>aam</tt> corrects the result of a multiplication of two valid unpacked
decimal numbers. <tt>aam</tt> must always follow the multiplication of two
decimal numbers to produce a valid decimal result. The high order digit is
left in <tt>ah</tt>, the low order digit in <tt>al</tt>. The generalized version
of this instruction allows adjustment of the contents of the <tt>ax</tt> to
create two unpacked digits of any number base. The standard version of this
instruction has no operands, the generalized version has one operand - an
immediate value specifying the number base for the created digits.

<div class="p"><!----></div>
<a 
id="aad2157"></a><tt>aad</tt> modifies the numerator in <tt>ah</tt> and <tt>ah</tt> to prepare for
the division of two valid unpacked decimal operands so that the quotient
produced by the division will be a valid unpacked decimal number. <tt>ah</tt>
should contain the high order digit and <tt>al</tt> the low order digit.
This instruction adjusts the value and places the result in <tt>al</tt>, while
<tt>ah</tt> will contain zero. The generalized version of this instruction
allows adjustment of two unpacked digits of any number base. Rules for the
operand are the same as for the <tt>aam</tt> instruction.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.5"></a><h3>
2.1.5&nbsp;&nbsp;Logical instructions</h3>
<a 
id="not2158"></a>
<tt>not</tt> inverts the bits in the specified operand to form a one's
complement of the operand. It has no effect on the flags. Rules for the
operand are the same as for the <tt>inc</tt> instruction.

<div class="p"><!----></div>
<a 
id="and2159"></a><a 
id="or2160"></a><a 
id="xor2161"></a><tt>and</tt>, <tt>or</tt> and <tt>xor</tt> instructions perform the standard
logical operations. They update the SF, ZF and PF flags. Rules for the
operands are the same as for the <tt>add</tt> instruction.

<div class="p"><!----></div>
<a 
id="bt2162"></a><a 
id="bts2163"></a><a 
id="btr2164"></a><a 
id="btc2165"></a><tt>bt</tt>, <tt>bts</tt>, <tt>btr</tt> and <tt>btc</tt> instructions operate on a
single bit which can be in memory or in a general register. The location of
the bit is specified as an offset from the low order end of the operand.
The value of the offset is the taken from the second operand, it either may
be an immediate byte or a general register. These instructions first assign
the value of the selected bit to CF. <tt>bt</tt> instruction does nothing more,
<tt>bts</tt> sets the selected bit to 1, <tt>btr</tt> resets the selected bit to
0, <tt>btc</tt> changes the bit to its complement. The first operand can be
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;bts&nbsp;word&nbsp;[bx],15&nbsp;;&nbsp;test&nbsp;and&nbsp;set&nbsp;bit&nbsp;in&nbsp;memory
&nbsp;&nbsp;&nbsp;&nbsp;btr&nbsp;ax,cx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;test&nbsp;and&nbsp;reset&nbsp;bit&nbsp;in&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;btc&nbsp;word&nbsp;[bx],cx&nbsp;;&nbsp;test&nbsp;and&nbsp;complement&nbsp;bit&nbsp;in&nbsp;memory

</pre>

<div class="p"><!----></div>
<a 
id="bsf2166"></a><a 
id="bsr2167"></a><tt>bsf</tt> and <tt>bsr</tt> instructions scan a word or double word for first
set bit and store the index of this bit into destination operand, which must
be general register. The bit string being scanned is specified by source
operand, it may be either general register or memory. The ZF flag is set if
the entire string is zero (no set bits are found); otherwise it is cleared.
If no set bit is found, the value of the destination register is undefined.
<tt>bsf</tt> from low order to high order (starting from bit index zero).
<tt>bsr</tt> scans from high order to low order (starting from bit index 15 of
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;bsf&nbsp;ax,bx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;scan&nbsp;register&nbsp;forward
&nbsp;&nbsp;&nbsp;&nbsp;bsr&nbsp;ax,[si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;scan&nbsp;memory&nbsp;reverse

</pre>

<div class="p"><!----></div>
<a 
id="shl2168"></a><tt>shl</tt> shifts the destination operand left by the number of bits
specified in the second operand. The destination operand can be byte, word,
or double word general register or memory. The second operand can be an
immediate value or the <tt>cl</tt> register. The processor shifts zeros in from
the right (low order) side of the operand as bits exit from the left side.
The last bit that exited is stored in CF. <tt>sal</tt> is a synonym for
<tt>shl</tt>.

................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;shl&nbsp;byte&nbsp;[bx],1&nbsp;&nbsp;;&nbsp;shift&nbsp;memory&nbsp;left&nbsp;by&nbsp;one&nbsp;bit
&nbsp;&nbsp;&nbsp;&nbsp;shl&nbsp;ax,cl&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;shift&nbsp;register&nbsp;left&nbsp;by&nbsp;count&nbsp;from&nbsp;cl
&nbsp;&nbsp;&nbsp;&nbsp;shl&nbsp;word&nbsp;[bx],cl&nbsp;;&nbsp;shift&nbsp;memory&nbsp;left&nbsp;by&nbsp;count&nbsp;from&nbsp;cl

</pre>

<div class="p"><!----></div>
<a 
id="shr2169"></a><a 
id="sar2170"></a><tt>shr</tt> and <tt>sar</tt> shift the destination operand right by the number
of bits specified in the second operand. Rules for operands are the same as
for the <tt>shl</tt> instruction. <tt>shr</tt> shifts zeros in from the left side
of the operand as bits exit from the right side. The last bit that exited is
stored in CF. <tt>sar</tt> preserves the sign of the operand by shifting in
zeros on the left side if the value is positive or by shifting in ones if the
value is negative.

<div class="p"><!----></div>
<a 
id="shld2171"></a><tt>shld</tt> shifts bits of the destination operand to the left by the number
of bits specified in third operand, while shifting high order bits from the
source operand into the destination operand on the right. The source operand
remains unmodified. The destination operand can be a word or double word
general register or memory, the source operand must be a general register,
third operand can be an immediate value or the <tt>cl</tt> register.

<pre>
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;shld&nbsp;[di],bx,1&nbsp;&nbsp;&nbsp;;&nbsp;shift&nbsp;memory&nbsp;left&nbsp;by&nbsp;one&nbsp;bit
&nbsp;&nbsp;&nbsp;&nbsp;shld&nbsp;ax,bx,cl&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;shift&nbsp;register&nbsp;left&nbsp;by&nbsp;count&nbsp;from&nbsp;cl
&nbsp;&nbsp;&nbsp;&nbsp;shld&nbsp;[di],bx,cl&nbsp;&nbsp;;&nbsp;shift&nbsp;memory&nbsp;left&nbsp;by&nbsp;count&nbsp;from&nbsp;cl

</pre>

<div class="p"><!----></div>
<a 
id="shrd2172"></a><tt>shrd</tt> shifts bits of the destination operand to the right, while
shifting low order bits from the source operand into the destination operand
on the left. The source operand remains unmodified. Rules for operands are
the same as for the <tt>shld</tt> instruction.

<div class="p"><!----></div>
<a 
id="rol2173"></a><a 
id="rcl2174"></a><tt>rol</tt> and <tt>rcl</tt> rotate the byte, word or double word destination
operand left by the number of bits specified in the second operand. For each
rotation specified, the high order bit that exits from the left of the
operand returns at the right to become the new low order bit. <tt>rcl</tt>
additionally puts in CF each high order bit that exits from the left side
of the operand before it returns to the operand as the low order bit on the
next rotation cycle. Rules for operands are the same as for the <tt>shl</tt>
instruction.

<div class="p"><!----></div>
<a 
id="ror2175"></a><a 
id="rcr2176"></a><tt>ror</tt> and <tt>rcr</tt> rotate the byte, word or double word destination
operand right by the number of bits specified in the second operand. For each
rotation specified, the low order bit that exits from the right of the
operand returns at the left to become the new high order bit. <tt>rcr</tt>
additionally puts in CF each low order bit that exits from the right side of
the operand before it returns to the operand as the high order bit on the
next rotation cycle. Rules for operands are the same as for the <tt>shl</tt>
instruction.

<div class="p"><!----></div>
<a 
id="test2177"></a><tt>test</tt> performs the same action as the <tt>and</tt> instruction, but it
does not alter the destination operand, only updates flags. Rules for the
operands are the same as for the <tt>and</tt> instruction.

<div class="p"><!----></div>
<a 
id="bswap2178"></a><tt>bswap</tt> reverses the byte order of a 32-bit general register:
bits 0 through 7 are swapped with bits 24 through 31, and bits 8 through 15
are swapped with bits 16 through 23. This instruction is provided for
converting little-endian values to big-endian format and vice versa.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;bswap&nbsp;edx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;swap&nbsp;bytes&nbsp;in&nbsp;register

................................................................................
</pre>

<div class="p"><!----></div>
     <a id="tth_sEc2.1.6"></a><h3>
2.1.6&nbsp;&nbsp;Control transfer instructions</h3>

<div class="p"><!----></div>
<a 
id="jmp2179"></a><tt>jmp</tt> unconditionally transfers control to the target location. The
destination address can be specified directly within the instruction or
indirectly through a register or memory, the acceptable size of this address
depends on whether the jump is near or far (it can be specified by preceding
the operand with <tt>near</tt> or <tt>far</tt> operator) and whether the instruction is
16-bit or 32-bit. Operand for near jump should be <tt>word</tt> size for 16-bit
instruction or the <tt>dword</tt> size for 32-bit instruction. Operand for far jump
should be <tt>dword</tt> size for 16-bit instruction or <tt>pword</tt> size for 32-bit
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;jmp&nbsp;0FFFFh:0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;direct&nbsp;far&nbsp;jump
&nbsp;&nbsp;&nbsp;&nbsp;jmp&nbsp;ax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;indirect&nbsp;near&nbsp;jump
&nbsp;&nbsp;&nbsp;&nbsp;jmp&nbsp;pword&nbsp;[ebx]&nbsp;&nbsp;;&nbsp;indirect&nbsp;far&nbsp;jump

</pre>

<div class="p"><!----></div>
<a 
id="call2180"></a><tt>call</tt> transfers control to the procedure, saving on the stack the
address of the instruction following the <tt>call</tt> for later use by a
<tt>ret</tt> (return) instruction. Rules for the operands are the same as for
the <tt>jmp</tt> instruction, but the <tt>call</tt> has no short variant of
direct instruction and thus it not optimized.

<div class="p"><!----></div>
<a 
id="ret2181"></a><a 
id="retn2182"></a><a 
id="retf2183"></a><a 
id="retw2184"></a><a 
id="retnw2185"></a><a 
id="retfw2186"></a><a 
id="retd2187"></a><a 
id="retnd2188"></a><a 
id="retfd2189"></a><tt>ret</tt>, <tt>retn</tt> and <tt>retf</tt> instructions terminate the execution
of a procedure and transfers control back to the program that originally
invoked the procedure using the address that was stored on the stack by the
<tt>call</tt> instruction. <tt>ret</tt> is the equivalent for <tt>retn</tt>, which
returns from the procedure that was executed using the near call, while
<tt>retf</tt> returns from the procedure that was executed using the far call.
These instructions default to the size of address appropriate for the current
code setting, but the size of address can be forced to 16-bit by using the
................................................................................
the <tt>retd</tt>, <tt>retnd</tt> and <tt>retfd</tt> mnemonics. All these
instructions may optionally specify an immediate operand, by adding this
constant to the stack pointer, they effectively remove any arguments that the
calling program pushed on the stack before the execution of the <tt>call</tt>
instruction.

<div class="p"><!----></div>
<a 
id="iret2190"></a><a 
id="iretw2191"></a><a 
id="iretd2192"></a><tt>iret</tt> returns control to an interrupted procedure. It differs from
<tt>ret</tt> in that it also pops the flags from the stack into the flags
register. The flags are stored on the stack by the interrupt mechanism. It
defaults to the size of return address appropriate for the current code
setting, but it can be forced to use 16-bit or 32-bit address by using the
<tt>iretw</tt> or <tt>iretd</tt> mnemonic.

<div class="p"><!----></div>
<a 
id="jo2193"></a><a 
id="jno2194"></a><a 
id="jc2195"></a><a 
id="jb2196"></a><a 
id="jnae2197"></a><a 
id="jnc2198"></a><a 
id="jae2199"></a><a 
id="jnb21100"></a><a 
id="je21101"></a><a 
id="jz21102"></a><a 
id="jne21103"></a><a 
id="jnz21104"></a><a 
id="jbe21105"></a><a 
id="jna21106"></a><a 
id="ja21107"></a><a 
id="jnbe21108"></a><a 
id="js21109"></a><a 
id="jns21110"></a><a 
id="jp21111"></a><a 
id="jpe21112"></a><a 
id="jnp21113"></a><a 
id="jpo21114"></a><a 
id="jl21115"></a><a 
id="jnge21116"></a><a 
id="jge21117"></a><a 
id="jnl21118"></a><a 
id="jle21119"></a><a 
id="jng21120"></a><a 
id="jg21121"></a><a 
id="jnle21122"></a>The conditional transfer instructions are jumps that may or may not transfer
control, depending on the state of the CPU flags when the instruction
executes. The mnemonics for conditional jumps may be obtained by attaching
the condition mnemonic (see table ) to the <tt>j</tt>
mnemonic, for example <tt>jc</tt> instruction will transfer the control when
the CF flag is set. The conditional jumps can be short or near, and direct only, and
can be optimized (see <a href="#sec:jumps">1.2.5</a>), the operand should be an immediate
value specifying target address.
................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb2.1">
</a> 
<div style="text-align:center">
<table border="1">
<tr><td align="center">Mnemonic </td><td align="center">Condition tested </td><td align="center">Description </td></tr><tr><td></td></tr>
<tr><td align="center"><tt>o</tt> </td><td align="center">OF = 1 </td><td align="center">overflow</td></tr>
<tr><td align="center"><tt>no</tt> </td><td align="center">OF = 0 </td><td align="center">not overflow</td></tr>
<tr><td align="center"><tt>c</tt> </td><td align="center"></td><td align="center">carry</td></tr>
<tr><td align="center"><tt>b</tt> </td><td align="center">CF = 1 </td><td align="center">below</td></tr>
<tr><td align="center"><tt>nae</tt> </td><td align="center"></td><td align="center">not above nor equal</td></tr>
<tr><td align="center"><tt>nc</tt> </td><td align="center"></td><td align="center">not carry</td></tr>
................................................................................
</div>

<div style="text-align:center">Table 2.1: Conditions.</div>
<a id="tab:conditions">
</a>

<div class="p"><!----></div>
<a 
id="loop21123"></a><a 
id="loopw21124"></a><a 
id="loopd21125"></a><a 
id="loope21126"></a><a 
id="loopz21127"></a><a 
id="loopew21128"></a><a 
id="loopzw21129"></a><a 
id="looped21130"></a><a 
id="loopzd21131"></a><a 
id="loopne21132"></a><a 
id="loopnz21133"></a><a 
id="loopnew21134"></a><a 
id="loopnzw21135"></a><a 
id="loopned21136"></a><a 

id="loopnzd21137"></a>The <tt>loop</tt> instructions are conditional jumps that use a value placed in
<tt>cx</tt> (or <tt>ecx</tt>) to specify the number of repetitions of a software
loop. All <tt>loop</tt> instructions automatically decrement <tt>cx</tt> (or
<tt>ecx</tt>) and terminate the loop (don't transfer the control) when
<tt>cx</tt> (or <tt>ecx</tt>) is zero. It uses <tt>cx</tt> or <tt>ecx</tt> whether
the current code setting is 16-bit or 32-bit, but it can be forced to use
<tt>cx</tt> with the <tt>loopw</tt> mnemonic or to use <tt>ecx</tt> with the
<tt>loopd</tt> mnemonic. <tt>loope</tt> and <tt>loopz</tt> are the synonyms for the
................................................................................
<tt>loopned</tt> and <tt>loopnzd</tt> force them to use <tt>ecx</tt> register.
Every <tt>loop</tt> instruction needs an operand being an immediate value
specifying target address, it can be only short jump (in the range of 128
bytes back and 127 bytes forward from the address of instruction following
the <tt>loop</tt> instruction).

<div class="p"><!----></div>
<a 
id="jcxz21138"></a><a 
id="jecxz21139"></a><tt>jcxz</tt> branches to the label specified in the instruction if it finds a
value of zero in <tt>cx</tt>, <tt>jecxz</tt> does the same, but checks the value
of <tt>ecx</tt> instead of <tt>cx</tt>. Rules for the operands are the same as
for the <tt>loop</tt> instruction.

<div class="p"><!----></div>
<a 
id="int21140"></a><a 
id="int321141"></a><a 
id="into21142"></a><tt>int</tt> activates the interrupt service routine that corresponds to the
number specified as an operand to the instruction, the number should be in
range from 0 to 255. The interrupt service routine terminates with an
<tt>iret</tt> instruction that returns control to the instruction that follows
<tt>int</tt>. <tt>int3</tt> mnemonic codes the short (one byte) trap that invokes
the interrupt 3. <tt>into</tt> instruction invokes the interrupt 4 if the OF
flag is set.

<div class="p"><!----></div>
<a 
id="bound21143"></a><tt>bound</tt> verifies that the signed value contained in the specified
register lies within specified limits. An interrupt 5 occurs if the value
contained in the register is less than the lower bound or greater than the
upper bound. It needs two operands, the first operand specifies the register
being tested, the second operand should be memory address for the two signed
limit values. The operands can be <tt>word</tt> or <tt>dword</tt> in size.

<pre>
................................................................................
</pre>

<div class="p"><!----></div>
     <a id="tth_sEc2.1.7"></a><h3>
2.1.7&nbsp;&nbsp;I/O instructions</h3>

<div class="p"><!----></div>
<a 
id="in21144"></a><tt>in</tt> transfers a byte, word, or double word from an input port to
<tt>al</tt>, <tt>ax</tt>, or <tt>eax</tt>. I/O ports can be addressed either
directly, with the immediate byte value coded in instruction, or indirectly
via the <tt>dx</tt> register. The destination operand should be <tt>al</tt>,
<tt>ax</tt>, or <tt>eax</tt> register. The source operand should be an immediate
value in range from 0 to 255, or <tt>dx</tt> register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;in&nbsp;al,20h&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;input&nbsp;byte&nbsp;from&nbsp;port&nbsp;20h
&nbsp;&nbsp;&nbsp;&nbsp;in&nbsp;ax,dx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;input&nbsp;word&nbsp;from&nbsp;port&nbsp;addressed&nbsp;by&nbsp;dx

</pre>

<div class="p"><!----></div>
<a 
id="out21145"></a><tt>out</tt> transfers a byte, word, or double word to an output port from
<tt>al</tt>, <tt>ax</tt>, or <tt>eax</tt>. The program can specify the number of
the port using the same methods as the <tt>in</tt> instruction. The destination
operand should be an immediate value in range from 0 to 255, or <tt>dx</tt>
register. The source operand should be <tt>al</tt>, <tt>ax</tt>, or <tt>eax</tt>
register.

<pre>
................................................................................
of string element, it should be <tt>b</tt> for byte element, <tt>w</tt> for word
element, and <tt>d</tt> for double word element. Full form of string operation
needs operands providing the size operator and the memory addresses, which
can be <tt>si</tt> or <tt>esi</tt> with any segment prefix, <tt>di</tt> or
<tt>edi</tt> always with <tt>es</tt> segment prefix.

<div class="p"><!----></div>
<a 
id="movs21146"></a><a 
id="movsb21147"></a><a 
id="movsw21148"></a><a 
id="movsd21149"></a><tt>movs</tt> transfers the string element pointed to by <tt>si</tt> (or
<tt>esi</tt>) to the location pointed to by <tt>di</tt> (or <tt>edi</tt>). Size of
operands can be <tt>byte</tt>, <tt>word</tt> or <tt>dword</tt>. The destination
operand should be memory addressed by <tt>di</tt> or <tt>edi</tt>, the source
operand should be memory addressed by <tt>si</tt> or <tt>esi</tt> with any
segment prefix.

<pre>
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;movs&nbsp;byte&nbsp;[di],[si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;transfer&nbsp;byte
&nbsp;&nbsp;&nbsp;&nbsp;movs&nbsp;word&nbsp;[es:di],[ss:si]&nbsp;&nbsp;;&nbsp;transfer&nbsp;word
&nbsp;&nbsp;&nbsp;&nbsp;movsd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;transfer&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a 
id="cmps21150"></a><a 
id="cmpsb21151"></a><a 
id="cmpsw21152"></a><a 
id="cmpsd21153"></a><tt>cmps</tt> subtracts the destination string element from the source string
element and updates the flags AF, SF, PF, CF and OF, but it does not change
any of the compared elements. If the string elements are equal, ZF is set,
otherwise it is cleared. The first operand for this instruction should be the
source string element addressed by <tt>si</tt> or <tt>esi</tt> with any segment
prefix, the second operand should be the destination string element addressed
by <tt>di</tt> or <tt>edi</tt>.

................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;cmpsb&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;compare&nbsp;bytes
&nbsp;&nbsp;&nbsp;&nbsp;cmps&nbsp;word&nbsp;[ds:si],[es:di]&nbsp;&nbsp;;&nbsp;compare&nbsp;words
&nbsp;&nbsp;&nbsp;&nbsp;cmps&nbsp;dword&nbsp;[fs:esi],[edi]&nbsp;&nbsp;;&nbsp;compare&nbsp;double&nbsp;words

</pre>

<div class="p"><!----></div>
<a 
id="scas21154"></a><a 
id="scasb21155"></a><a 
id="scasw21156"></a><a 
id="scasd21157"></a><tt>scas</tt> subtracts the destination string element from <tt>al</tt>,
<tt>ax</tt>, or <tt>eax</tt> (depending on the size of string element) and
updates the flags AF, SF, ZF, PF, CF and OF. If the values are equal, ZF is
set, otherwise it is cleared. The operand should be the destination string
element addressed by <tt>di</tt> or <tt>edi</tt>.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;scas&nbsp;byte&nbsp;[es:di]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;scan&nbsp;byte
&nbsp;&nbsp;&nbsp;&nbsp;scasw&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;scan&nbsp;word
&nbsp;&nbsp;&nbsp;&nbsp;scas&nbsp;dword&nbsp;[es:edi]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;scan&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a 
id="lods21158"></a><a 
id="lodsb21159"></a><a 
id="lodsw21160"></a><a 
id="lodsd21161"></a><tt>lods</tt> places the source string element into <tt>al</tt>, <tt>ax</tt>, or
<tt>eax</tt>. The operand should be the source string element addressed by
<tt>si</tt> or <tt>esi</tt> with any segment prefix.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;lods&nbsp;byte&nbsp;[ds:si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;byte
&nbsp;&nbsp;&nbsp;&nbsp;lods&nbsp;word&nbsp;[cs:si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;word
&nbsp;&nbsp;&nbsp;&nbsp;lodsd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a 
id="stos21162"></a><a 
id="stosb21163"></a><a 
id="stosw21164"></a><a 
id="stosd21165"></a><tt>stos</tt> places the value of <tt>al</tt>, <tt>ax</tt>, or <tt>eax</tt> into the
destination string element. Rules for the operand are the same as for the
<tt>scas</tt> instruction.

<div class="p"><!----></div>
<a 
id="ins21166"></a><a 
id="insb21167"></a><a 
id="insw21168"></a><a 
id="insd21169"></a><tt>ins</tt> transfers a byte, word, or double word from an input port
addressed by <tt>dx</tt> register to the destination string element. The
destination operand should be memory addressed by <tt>di</tt> or <tt>edi</tt>,
the source operand should be the <tt>dx</tt> register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;insb&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;input&nbsp;byte
&nbsp;&nbsp;&nbsp;&nbsp;ins&nbsp;word&nbsp;[es:di],dx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;input&nbsp;word
&nbsp;&nbsp;&nbsp;&nbsp;ins&nbsp;dword&nbsp;[edi],dx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;input&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a 
id="outs21170"></a><a 
id="outsb21171"></a><a 
id="outsw21172"></a><a 
id="outsd21173"></a><tt>outs</tt> transfers the source string element to an output port addressed
by <tt>dx</tt> register. The destination operand should be the <tt>dx</tt>
register and the source operand should be memory addressed by <tt>si</tt> or
<tt>esi</tt> with any segment prefix.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;outs&nbsp;dx,byte&nbsp;[si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;output&nbsp;byte
&nbsp;&nbsp;&nbsp;&nbsp;outsw&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;output&nbsp;word
&nbsp;&nbsp;&nbsp;&nbsp;outs&nbsp;dx,dword&nbsp;[gs:esi]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;output&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a 
id="rep21174"></a><a 
id="repe21175"></a><a 
id="repz21176"></a><a 
id="repne21177"></a><a 
id="repnz21178"></a>The repeat prefixes <tt>rep</tt>, <tt>repe</tt>/<tt>repz</tt>, and
<tt>repne</tt>/<tt>repnz</tt> specify repeated string operation. When a string
operation instruction has a repeat prefix, the operation is executed
repeatedly, each time using a different element of the string. The repetition
terminates when one of the conditions specified by the prefix is satisfied.
All three prefixes automatically decrease <tt>cx</tt> or <tt>ecx</tt> register
(depending whether string operation instruction uses the 16-bit or 32-bit
addressing) after each operation and repeat the associated operation until
................................................................................

<div class="p"><!----></div>
The flag control instructions provide a method for directly changing the
state of bits in the flag register. All instructions described in this
section have no operands.

<div class="p"><!----></div>
<a 
id="stc21179"></a><a 
id="clc21180"></a><a 
id="cmc21181"></a><a 
id="std21182"></a><a 
id="cld21183"></a><a 
id="sti21184"></a><a 
id="cli21185"></a><tt>stc</tt> sets the CF (carry flag) to 1, <tt>clc</tt> zeroes the CF,
<tt>cmc</tt> changes the CF to its complement. <tt>std</tt> sets the DF
(direction flag) to 1, <tt>cld</tt> zeroes the DF, <tt>sti</tt> sets the IF
(interrupt flag) to 1 and therefore enables the interrupts, <tt>cli</tt> zeroes
the IF and therefore disables the interrupts.

<div class="p"><!----></div>
<a 
id="lahf21186"></a><tt>lahf</tt> copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the
<tt>ah</tt> register. The contents of the remaining bits are undefined.
The flags remain unaffected.

<div class="p"><!----></div>
<a 
id="sahf21187"></a><tt>sahf</tt> transfers bits 7, 6, 4, 2, and 0 from the <tt>ah</tt> register
into SF, ZF, AF, PF, and CF.

<div class="p"><!----></div>
<a 
id="pushf21188"></a><a 
id="pushfw21189"></a><a 
id="pushfd21190"></a><tt>pushf</tt> decrements <tt>esp</tt> by two or four and stores the low word or
double word of flags register at the top of stack, size of stored data
depends on the current code setting. <tt>pushfw</tt> variant forces storing the
word and <tt>pushfd</tt> forces storing the double word.

<div class="p"><!----></div>
<a 
id="popf21191"></a><a 
id="popfw21192"></a><a 
id="popfd21193"></a><tt>popf</tt> transfers specific bits from the word or double word at the top
of stack, then increments <tt>esp</tt> by two or four, this value depends on
the current code setting. <tt>popfw</tt> variant forces restoring from the word
and <tt>popfd</tt> forces restoring from the double word.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.10"></a><h3>
2.1.10&nbsp;&nbsp;Conditional operations</h3>

<div class="p"><!----></div>
<a 
id="seto21194"></a><a 
id="setno21195"></a><a 
id="setc21196"></a><a 
id="setb21197"></a><a 
id="setnae21198"></a><a 
id="setnc21199"></a><a 
id="setae21200"></a><a 
id="setnb21201"></a><a 
id="sete21202"></a><a 
id="setz21203"></a><a 
id="setne21204"></a><a 
id="setnz21205"></a><a 
id="setbe21206"></a><a 
id="setna21207"></a><a 
id="seta21208"></a><a 
id="setnbe21209"></a><a 
id="sets21210"></a><a 
id="setns21211"></a><a 
id="setp21212"></a><a 
id="setpe21213"></a><a 
id="setnp21214"></a><a 
id="setpo21215"></a><a 
id="setl21216"></a><a 
id="setnge21217"></a><a 
id="setge21218"></a><a 
id="setnl21219"></a><a 
id="setle21220"></a><a 
id="setng21221"></a><a 
id="setg21222"></a><a 
id="setnle21223"></a>The instructions obtained by attaching the condition mnemonic (see table
<a href="#tab:conditions">2.1</a>) to the <tt>set</tt> mnemonic set a byte to one if the
condition is true and set the byte to zero otherwise. The operand should be
an 8-bit be general register or the byte in memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;setne&nbsp;al&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;set&nbsp;al&nbsp;if&nbsp;zero&nbsp;flag&nbsp;cleared
&nbsp;&nbsp;&nbsp;&nbsp;seto&nbsp;byte&nbsp;[bx]&nbsp;&nbsp;&nbsp;;&nbsp;set&nbsp;byte&nbsp;if&nbsp;overflow

</pre>

<div class="p"><!----></div>
<a 
id="salc21224"></a><tt>salc</tt> instruction sets the all bits of <tt>al</tt> register when the
carry flag is set and zeroes the <tt>al</tt> register otherwise. This
instruction has no arguments.

<div class="p"><!----></div>
<a 
id="cmovo21225"></a><a 
id="cmovno21226"></a><a 
id="cmovc21227"></a><a 
id="cmovb21228"></a><a 
id="cmovnae21229"></a><a 
id="cmovnc21230"></a><a 
id="cmovae21231"></a><a 
id="cmovnb21232"></a><a 
id="cmove21233"></a><a 
id="cmovz21234"></a><a 
id="cmovne21235"></a><a 
id="cmovnz21236"></a><a 
id="cmovbe21237"></a><a 
id="cmovna21238"></a><a 
id="cmova21239"></a><a 
id="cmovnbe21240"></a><a 
id="cmovs21241"></a><a 
id="cmovns21242"></a><a 
id="cmovp21243"></a><a 
id="cmovpe21244"></a><a 
id="cmovnp21245"></a><a 
id="cmovpo21246"></a><a 
id="cmovl21247"></a><a 
id="cmovnge21248"></a><a 
id="cmovge21249"></a><a 
id="cmovnl21250"></a><a 
id="cmovle21251"></a><a 
id="cmovng21252"></a><a 

id="cmovg21253"></a><a 
id="cmovnle21254"></a>The instructions obtained by attaching the condition mnemonic to
<tt>cmov</tt> mnemonic transfer the word or double word from the general
register or memory to the general register only when the condition is true.
The destination operand should be general register, the source operand can be
general register or memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;cmove&nbsp;ax,bx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;move&nbsp;when&nbsp;zero&nbsp;flag&nbsp;set
&nbsp;&nbsp;&nbsp;&nbsp;cmovnc&nbsp;eax,[ebx]&nbsp;;&nbsp;move&nbsp;when&nbsp;carry&nbsp;flag&nbsp;cleared

</pre>

<div class="p"><!----></div>
<a 
id="cmpxchg21255"></a><tt>cmpxchg</tt> compares the value in the <tt>al</tt>, <tt>ax</tt>, or <tt>eax</tt>
register with the destination operand. If the two values are equal,
the source operand is loaded into the destination operand. Otherwise,
the destination operand is loaded into the <tt>al</tt>, <tt>ax</tt>, or <tt>eax</tt>
register. The destination operand may be a general register or memory, the
source operand must be a general register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;cmpxchg&nbsp;dl,bl&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;compare&nbsp;and&nbsp;exchange&nbsp;with&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;cmpxchg&nbsp;[bx],dx&nbsp;&nbsp;;&nbsp;compare&nbsp;and&nbsp;exchange&nbsp;with&nbsp;memory

</pre>

<div class="p"><!----></div>
<a 
id="cmpxchg8b21256"></a><tt>cmpxchg8b</tt> compares the 64-bit value in <tt>edx</tt> and <tt>eax</tt>
registers with the destination operand. If the values are equal, the 64-bit
value in <tt>ecx</tt> and <tt>ebx</tt> registers is stored in the destination
operand. Otherwise, the value in the destination operand is loaded into
<tt>edx</tt> and <tt>eax</tt> registers. The destination operand should be a
quad word in memory.

<pre>
................................................................................
</pre>

<div class="p"><!----></div>
     <a id="tth_sEc2.1.11"></a><h3>
2.1.11&nbsp;&nbsp;Miscellaneous instructions</h3>

<div class="p"><!----></div>
<a 
id="nop21257"></a><tt>nop</tt> instruction occupies one byte but affects nothing but the
instruction pointer. This instruction has no operands and doesn't perform any
operation.

<div class="p"><!----></div>
<a 
id="ud221258"></a><tt>ud2</tt> instruction generates an invalid opcode exception. This
instruction is provided for software testing to explicitly generate an
invalid opcode. This is instruction has no operands.

<div class="p"><!----></div>
<a 
id="xlat21259"></a><tt>xlat</tt> replaces a byte in the <tt>al</tt> register with a byte indexed by
its value in a translation table addressed by <tt>bx</tt> or <tt>ebx</tt>. The
operand should be a byte memory addressed by <tt>bx</tt> or <tt>ebx</tt> with any
segment prefix. This instruction has also a short form <tt>xlatb</tt> which has
no operands and uses the <tt>bx</tt> or <tt>ebx</tt> address in the segment
selected by <tt>ds</tt> depending on the current code setting.

<div class="p"><!----></div>
<a 
id="lds21260"></a><a 
id="les21261"></a><a 
id="lfs21262"></a><a 
id="lgs21263"></a><a 
id="lss21264"></a><tt>lds</tt> transfers a pointer variable from the source operand to <tt>ds</tt>
and the destination register. The source operand must be a memory operand,
and the destination operand must be a general register. The <tt>ds</tt>
register receives the segment selector of the pointer while the destination
register receives the offset part of the pointer. <tt>les</tt>, <tt>lfs</tt>,
<tt>lgs</tt> and <tt>lss</tt> operate identically to <tt>lds</tt> except that
rather than <tt>ds</tt> register the <tt>es</tt>, <tt>fs</tt>, <tt>gs</tt> and
<tt>ss</tt> is used respectively.
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;lds&nbsp;bx,[si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;pointer&nbsp;to&nbsp;ds:bx

</pre>

<div class="p"><!----></div>
<a 
id="lea21265"></a><tt>lea</tt> transfers the offset of the source operand (rather than its value)
to the destination operand. The source operand must be a memory operand, and
the destination operand must be a general register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;lea&nbsp;dx,[bx+si+1]&nbsp;;&nbsp;load&nbsp;effective&nbsp;address&nbsp;to&nbsp;dx

</pre>

<div class="p"><!----></div>
<a 
id="cpuid21266"></a><tt>cpuid</tt> returns processor identification and feature information in the
<tt>eax</tt>, <tt>ebx</tt>, <tt>ecx</tt>, and <tt>edx</tt> registers. The information
returned is selected by entering a value in the <tt>eax</tt> register before
the instruction is executed. This instruction has no operands.

<div class="p"><!----></div>
<a 
id="pause21267"></a><tt>pause</tt> instruction delays the execution of the next instruction an
implementation specific amount of time. It can be used to improve the
performance of spin wait loops. This instruction has no operands.

<div class="p"><!----></div>
<a 
id="enter21268"></a><a 
id="leave21269"></a><tt>enter</tt> creates a stack frame that may be used to implement the scope
rules of block-structured high-level languages. A <tt>leave</tt> instruction
at the end of a procedure complements an <tt>enter</tt> at the beginning of the
procedure to simplify stack management and to control access to variables for
nested procedures. The <tt>enter</tt> instruction includes two parameters. The
first parameter specifies the number of bytes of dynamic storage to be
allocated on the stack for the routine being entered. The second parameter
corresponds to the lexical nesting level of the routine, it can be in range
................................................................................
</pre>

<div class="p"><!----></div>
     <a id="tth_sEc2.1.12"></a><h3>
2.1.12&nbsp;&nbsp;System instructions</h3>

<div class="p"><!----></div>
<a 
id="lmsw21270"></a><a 
id="smsw21271"></a><tt>lmsw</tt> loads the operand into the machine status word (bits 0 through 15
of <tt>cr0</tt> register), while <tt>smsw</tt> stores the machine status word
into the destination operand. The operand for both those instructions can be 16-bit
general register or memory, for <tt>smsw</tt> it can also be 32-bit general
register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;lmsw&nbsp;ax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;machine&nbsp;status&nbsp;from&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;smsw&nbsp;[bx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;store&nbsp;machine&nbsp;status&nbsp;to&nbsp;memory

</pre>

<div class="p"><!----></div>
<a 
id="lgdt21272"></a><a 
id="lidt21273"></a><a 
id="sgdt21274"></a><a 
id="sidt21275"></a><tt>lgdt</tt> and <tt>lidt</tt> instructions load the values in operand into the
global descriptor table register or the interrupt descriptor table register
respectively. <tt>sgdt</tt> and <tt>sidt</tt> store the contents of the global
descriptor table register or the interrupt descriptor table register
in the destination operand. The operand should be a 6 bytes in memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;lgdt&nbsp;[ebx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;global&nbsp;descriptor&nbsp;table

</pre>

<div class="p"><!----></div>
<a 
id="lldt21276"></a><a 
id="sldt21277"></a><a 
id="ltr21278"></a><a 
id="str21279"></a><tt>lldt</tt> loads the operand into the segment selector field of
the local descriptor table register and <tt>sldt</tt> stores the
segment selector from the local descriptor table register in the
operand. <tt>ltr</tt> loads the operand into the segment selector
field of the task register and <tt>str</tt> stores the segment
selector from the task register in the operand. Rules for operand
are the same as for the <tt>lmsw</tt> and <tt>smsw</tt> instructions.

<div class="p"><!----></div>
<a 
id="lar21280"></a><tt>lar</tt> loads the access rights from the segment descriptor
specified by the selector in source operand into the destination
operand and sets the ZF flag. The destination operand can be a
16-bit or 32-bit general register. The source operand should be a
16-bit general register or memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;lar&nbsp;ax,[bx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;access&nbsp;rights&nbsp;into&nbsp;word
&nbsp;&nbsp;&nbsp;&nbsp;lar&nbsp;eax,dx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;access&nbsp;rights&nbsp;into&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a 
id="lsl21281"></a><tt>lsl</tt> loads the segment limit from the segment descriptor specified by
the selector in source operand into the destination operand and sets the ZF
flag. Rules for operand are the same as for the <tt>lar</tt> instruction.

<div class="p"><!----></div>
<a 
id="verr21282"></a><a 
id="verw21283"></a><tt>verr</tt> and <tt>verw</tt> verify whether the code or data segment specified
with the operand is readable or writable from the current privilege level.
The operand should be a word, it can be general register or memory.
If the segment is accessible and readable (for <tt>verr</tt>) or writable (for
<tt>verw</tt>) the ZF flag is set, otherwise it's cleared. Rules for operand
are the same as for the <tt>lldt</tt> instruction.

<div class="p"><!----></div>
<a 
id="arpl21284"></a><tt>arpl</tt> compares the RPL (requestor's privilege level) fields of two
segment selectors. The first operand contains one segment selector and the
second operand contains the other. If the RPL field of the destination
operand is less than the RPL field of the source operand, the ZF flag is set
and the RPL field of the destination operand is increased to match that of
the source operand. Otherwise, the ZF flag is cleared and no change is made
to the destination operand. The destination operand can be a word general
register or memory, the source operand must be a general register.
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;arpl&nbsp;bx,ax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;adjust&nbsp;RPL&nbsp;of&nbsp;selector&nbsp;in&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;arpl&nbsp;[bx],ax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;adjust&nbsp;RPL&nbsp;of&nbsp;selector&nbsp;in&nbsp;memory

</pre>

<div class="p"><!----></div>
<a 
id="clts21285"></a><tt>clts</tt> clears the TS (task switched) flag in the <tt>cr0</tt> register.
This instruction has no operands.

<div class="p"><!----></div>
<a 
id="lock21286"></a><tt>lock</tt> prefix causes the processor's bus-lock signal to be asserted during
execution of the accompanying instruction. In a multiprocessor environment,
the bus-lock signal insures that the processor has exclusive use of any shared
memory while the signal is asserted. The <tt>lock</tt> prefix can be prepended
only to the following instructions and only to those forms of the
instructions where the destination operand is a memory operand: <tt>add</tt>,
<tt>adc</tt>, <tt>and</tt>, <tt>btc</tt>, <tt>btr</tt>, <tt>bts</tt>, <tt>cmpxchg</tt>,
<tt>cmpxchg8b</tt>, <tt>dec</tt>, <tt>inc</tt>, <tt>neg</tt>, <tt>not</tt>, <tt>or</tt>,
................................................................................
operand is a memory operand, an undefined opcode exception may be generated.
An undefined opcode exception will also be generated if the <tt>lock</tt>
prefix is used with any instruction not in the above list. The <tt>xchg</tt>
instruction always asserts the bus-lock signal regardless of the presence or
absence of the <tt>lock</tt> prefix.

<div class="p"><!----></div>
<a 
id="hlt21287"></a><tt>hlt</tt> stops instruction execution and places the processor in a halted
state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET
signal will resume execution. This instruction has no operands.

<div class="p"><!----></div>
<a 
id="invlpg21288"></a><tt>invlpg</tt> invalidates (flushes) the TLB (translation lookaside buffer)
entry specified with the operand, which should be a memory. The processor
determines the page that contains that address and flushes the TLB entry for
that page.

<div class="p"><!----></div>
<a 
id="rdmsr21289"></a><a 
id="wrmsr21290"></a><tt>rdmsr</tt> loads the contents of a 64-bit MSR (model specific register)
of the address specified in the <tt>ecx</tt> register into registers <tt>edx</tt>
and <tt>eax</tt>. <tt>wrmsr</tt> writes the contents of registers <tt>edx</tt> and
<tt>eax</tt> into the 64-bit MSR of the address specified in the <tt>ecx</tt>
register. <tt>rdtsc</tt> loads the current value of the processor's time stamp
counter from the 64-bit MSR into the <tt>edx</tt> and <tt>eax</tt> registers.
The processor increments the time stamp counter MSR every clock cycle and
resets it to 0 whenever the processor is reset.

<div class="p"><!----></div>
<a 
id="rdpmc21291"></a><tt>rdpmc</tt> loads the contents of the 40-bit performance monitoring counter
specified in the <tt>ecx</tt> register into registers <tt>edx</tt> and
<tt>eax</tt>. These instructions have no operands.

<div class="p"><!----></div>
<a 
id="wbinvd21292"></a><tt>wbinvd</tt> writes back all modified cache lines in the processor's
internal cache to main memory and invalidates (flushes) the internal caches.
The instruction then issues a special function bus cycle that directs
external caches to also write back modified data and another bus cycle to
indicate that the external caches should be invalidated. This instruction has
no operands.

<div class="p"><!----></div>
<a 
id="rsm21293"></a><tt>rsm</tt> return program control from the system management mode to the
program that was interrupted when the processor received an SMM interrupt.
This instruction has no operands.

<div class="p"><!----></div>
<a 
id="sysenter21294"></a><a 
id="sysexit21295"></a><tt>sysenter</tt> executes a fast call to a level 0 system procedure, <tt>sysexit</tt>
executes a fast return to level 3 user code. The addresses used by these instructions
are stored in MSRs. These instructions have no operands.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.13"></a><h3>
2.1.13&nbsp;&nbsp;FPU instructions</h3>

................................................................................
the stack and each of them holds the double extended precision floating-point
value. When some values are pushed onto the stack or are removed from the top,
the FPU registers are shifted, so <tt>st0</tt> is always the value on
the top of FPU stack, <tt>st1</tt> is the first value below the top, etc.
The <tt>st0</tt> name has also the synonym <tt>st</tt>.

<div class="p"><!----></div>
<a 
id="fld21296"></a><tt>fld</tt> pushes the floating-point value onto the FPU register stack.
The operand can be 32-bit, 64-bit or 80-bit memory location or the
FPU register, its value is then loaded onto the top of FPU register stack
(the <tt>st0</tt> register) and is automatically converted into the
double extended precision format.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;fld&nbsp;dword&nbsp;[bx]&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;single&nbsp;prevision&nbsp;value&nbsp;from&nbsp;memory
&nbsp;&nbsp;&nbsp;&nbsp;fld&nbsp;st2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;push&nbsp;value&nbsp;of&nbsp;st2&nbsp;onto&nbsp;register&nbsp;stack

</pre>

<div class="p"><!----></div>
<a 
id="fld121297"></a><a 
id="lfdz21298"></a><a 
id="ldl2t21299"></a><a 
id="lfdl2e21300"></a><a 

id="fldpi21301"></a><a 
id="fldlg221302"></a><a 
id="fldln221303"></a><tt>fld1</tt>, <tt>fldz</tt>, <tt>fldl2t</tt>, <tt>fldl2e</tt>, <tt>fldpi</tt>,
<tt>fldlg2</tt> and <tt>fldln2</tt> load the commonly used contants onto the
FPU register stack. The loaded constants are +1.0, +0.0, log<sub>2</sub>10,
log<sub>2</sub>e, &#960;, log<sub>10</sub>2 and ln2 respectively. These instructions
have no operands.

<div class="p"><!----></div>
<a 
id="fild21304"></a><tt>fild</tt> converts the signed integer source operand into double extended
precision floating-point format and pushes the result onto the FPU register
stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;fild&nbsp;qword&nbsp;[bx]&nbsp;&nbsp;;&nbsp;load&nbsp;64-bit&nbsp;integer&nbsp;from&nbsp;memory

</pre>

<div class="p"><!----></div>
<a 
id="fst21305"></a><a 
id="fstp21306"></a><tt>fst</tt> copies the value of <tt>st0</tt> register to the destination operand,
which can be 32-bit or 64-bit memory location or another FPU register.
<tt>fstp</tt> performs the same operation as <tt>fst</tt> and then pops the register
stack, getting rid of <tt>st0</tt>. <tt>fstp</tt> accepts the same operands as
the <tt>fst</tt> instruction and can also store value in the 80-bit memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;fst&nbsp;st3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;copy&nbsp;value&nbsp;of&nbsp;st0&nbsp;into&nbsp;st3&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;fstp&nbsp;tword&nbsp;[bx]&nbsp;&nbsp;;&nbsp;store&nbsp;value&nbsp;in&nbsp;memory&nbsp;and&nbsp;pop&nbsp;stack

</pre>

<div class="p"><!----></div>
<a 
id="fist21307"></a><tt>fist</tt> converts the value in <tt>st0</tt> to a signed integer and stores
the result in the destination operand. The operand can be 16-bit or
32-bit memory location. <tt>fistp</tt> performs the same operation and then
pops the register stack, it accepts the same operands as the <tt>fist</tt>
instruction and can also store integer value in the 64-bit memory, so it
has the same rules for operands as <tt>fild</tt> instruction.

<div class="p"><!----></div>
<a 
id="fbld21308"></a><tt>fbld</tt> converts the packed BCD integer into double extended precision
floating-point format and pushes this value onto the FPU stack. <tt>fbstp</tt>
converts the value in <tt>st0</tt> to an 18-digit packed BCD integer, stores the
result in the destination operand, and pops the register stack. The operand
should be an 80-bit memory location.

<div class="p"><!----></div>
<a 
id="fadd21309"></a><tt>fadd</tt> adds the destination and source operand and stores the sum in the
destination location. The destination operand is always an FPU register, if the
source is a memory location, the destination is <tt>st0</tt> register and only
source operand should be specified. If both operands are FPU registers, at
least one of them should be <tt>st0</tt> register. An operand in memory can be
a 32-bit or 64-bit value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;fadd&nbsp;qword&nbsp;[bx]&nbsp;&nbsp;;&nbsp;add&nbsp;double&nbsp;precision&nbsp;value&nbsp;to&nbsp;st0
&nbsp;&nbsp;&nbsp;&nbsp;fadd&nbsp;st2,st0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;st0&nbsp;to&nbsp;st2

</pre>

<div class="p"><!----></div>
<a 
id="faddp21310"></a><tt>faddp</tt> adds the destination and source operand, stores the sum in the
destination location and then pops the register stack. The destination operand
must be an FPU register and the source operand must be the <tt>st0</tt>. When
no operands are specified, <tt>st1</tt> is used as a destination operand.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;faddp&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;st0&nbsp;to&nbsp;st1&nbsp;and&nbsp;pop&nbsp;the&nbsp;stack
&nbsp;&nbsp;&nbsp;&nbsp;faddp&nbsp;st2,st0&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;st0&nbsp;to&nbsp;st2&nbsp;and&nbsp;pop&nbsp;the&nbsp;stack

</pre>

<div class="p"><!----></div>
<a 
id="fiadd21311"></a><tt>fiadd</tt> instruction converts an integer source operand into double
extended precision floating-point value and adds it to the destination
operand. The operand should be a 16-bit or 32-bit memory location.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;fiadd&nbsp;word&nbsp;[bx]&nbsp;&nbsp;;&nbsp;add&nbsp;word&nbsp;integer&nbsp;to&nbsp;st0

</pre>

<div class="p"><!----></div>
<a 
id="fsub21312"></a><a 
id="fsubr21313"></a><a 
id="fmul21314"></a><a 
id="fdiv21315"></a><a 
id="fdivr21316"></a><tt>fsub</tt>, <tt>fsubr</tt>, <tt>fmul</tt>, <tt>fdiv</tt>, <tt>fdivr</tt> instruction
are similar to <tt>fadd</tt>, have the same rules for operands and differ only in
the perfomed computation. <tt>fsub</tt> substracts the source operand from the
destination operand, <tt>fsubr</tt> substract the destination operand from the
source operand, <tt>fmul</tt> multiplies the destination and source operands,
<tt>fdiv</tt> divides the destination operand by the source operand and <tt>fdivr</tt>
divides the source operand by the destination operand. <tt>fsubp</tt>, <tt>fsubrp</tt>,
<tt>fmulp</tt>, <tt>fdivp</tt>, <tt>fdivrp</tt> perform the same operations and pop the
register stack, the rules for operand are the same as for the <tt>faddp</tt>
instruction. <tt>fisub</tt>, <tt>fisubr</tt>, <tt>fimul</tt>, <tt>fidiv</tt>, <tt>fidivr</tt>
perform these operations after converting the integer source operand into
floating-point value, they have the same rules for operands as <tt>fiadd</tt>
instruction.

<div class="p"><!----></div>
<a 
id="fsqrt21317"></a><a 
id="fsin21318"></a><a 
id="fcos21319"></a><a 
id="fchs21320"></a><a 
id="fabs21321"></a><a 
id="frndint21322"></a><a 

id="f2xm121323"></a><tt>fsqrt</tt> computes the square root of the value in <tt>st0</tt> register,
<tt>fsin</tt> computes the sine of that value, <tt>fcos</tt> computes the cosine
of that value, <tt>fchs</tt> complements its sign bit, <tt>fabs</tt> clears its sign to
create the absolute value, <tt>frndint</tt> rounds it to the nearest integral value,
depending on the current rounding mode. <tt>f2xm1</tt> computes the exponential value
of 2 to the power of <tt>st0</tt> and substracts the 1.0 from it, the value of
<tt>st0</tt> must lie in the range &#8722;1.0 to +1.0.
All these instructions store the result in <tt>st0</tt> and have no operands.

<div class="p"><!----></div>
<a 
id="fsincos21324"></a><a 

id="fptan21325"></a><a 
id="fpatan21326"></a><a 
id="fyl2x21327"></a><a 
id="fyl2xp121328"></a><a 
id="fprem21329"></a><a 
id="fprem121330"></a><a 
id="fscale21331"></a><a 
id="fxtract21332"></a><a 

id="fnop21333"></a><tt>fsincos</tt> computes both the sine and the cosine of the value in
<tt>st0</tt> register, stores the sine in <tt>st0</tt> and pushes the cosine on the
top of FPU register stack. <tt>fptan</tt> computes the tangent of the value in
<tt>st0</tt>, stores the result in <tt>st0</tt> and pushes a 1.0 onto the FPU register
stack. <tt>fpatan</tt> computes the arctangent of the value in <tt>st1</tt> divided by
the value in <tt>st0</tt>, stores the result in <tt>st1</tt> and pops the FPU register
stack. <tt>fyl2x</tt> computes the binary logarithm of <tt>st0</tt>, multiplies it by
<tt>st1</tt>, stores the result in <tt>st1</tt> and pops the FPU register stack;
................................................................................
computes the remainder in the way specified by IEEE Standard 754. <tt>fscale</tt>
truncates the value in <tt>st1</tt> and increases the exponent of <tt>st0</tt> by this value.
<tt>fxtract</tt> separates the value in <tt>st0</tt> into its exponent and significand,
stores the exponent in <tt>st0</tt> and pushes the significand onto the register
stack. <tt>fnop</tt> performs no operation. These instructions have no operands.

<div class="p"><!----></div>
<a 
id="fxch21334"></a><tt>fxch</tt> exchanges the contents of <tt>st0</tt> an another FPU register. The
operand should be an FPU register, if no operand is specified, the contents of
<tt>st0</tt> and <tt>st1</tt> are exchanged.

<div class="p"><!----></div>
<a 
id="fcom21335"></a><a 
id="fcomp21336"></a><tt>fcom</tt> and <tt>fcomp</tt> compare the contents of <tt>st0</tt> and the source
operand and set flags in the FPU status word according to the results.
<tt>fcomp</tt> additionally pops the register stack after performing the comparison.
The operand can be a single or double precision value in memory or the FPU register.
When no operand is specified, <tt>st1</tt> is used as a source operand.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;fcom&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;compare&nbsp;st0&nbsp;with&nbsp;st1
&nbsp;&nbsp;&nbsp;&nbsp;fcomp&nbsp;st2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;compare&nbsp;st0&nbsp;with&nbsp;st2&nbsp;and&nbsp;pop&nbsp;stack

</pre>

<div class="p"><!----></div>
<a 
id="fcompp21337"></a><tt>fcompp</tt> compares the contents of <tt>st0</tt> and <tt>st1</tt>, sets flags in the
FPU status word according to the results and pops the register stack twice.
This instruction has no operands.

<div class="p"><!----></div>
<a 
id="fucom21338"></a><a 
id="fucomp21339"></a><a 
id="fucompp21340"></a><tt>fucom</tt>, <tt>fucomp</tt> and <tt>fucompp</tt> performs an unordered comparison of
two FPU registers. Rules for operands are the same as for the <tt>fcom</tt>,
<tt>fcomp</tt> and <tt>fcompp</tt>, but the source operand must be an FPU register.

<div class="p"><!----></div>
<a 
id="ficom21341"></a><a 
id="ficomp21342"></a><tt>ficom</tt> and <tt>ficomp</tt> compare the value in <tt>st0</tt> with an integer
source operand and set the flags in the FPU status word according to the results.
<tt>ficomp</tt> additionally pops the register stack after performing the comparison.
The integer value is converted to double extended precision floating-point format
before the comparison is made. The operand should be a 16-bit or 32-bit memory
location.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;ficom&nbsp;word&nbsp;[bx]&nbsp;&nbsp;;&nbsp;compare&nbsp;st0&nbsp;with&nbsp;16-bit&nbsp;integer

</pre>

<div class="p"><!----></div>
<a 
id="fcomi21343"></a><a 
id="fcomip21344"></a><a 
id="fucomi21345"></a><a 
id="fucomip21346"></a><tt>fcomi</tt>, <tt>fcomip</tt>, <tt>fucomi</tt>, <tt>fucomip</tt> perform the comparison
of <tt>st0</tt> with another FPU register and set the ZF, PF and CF flags according to
the results. <tt>fcomip</tt> and <tt>fucomip</tt> additionaly pop the register stack
after performing the comparison.

<div class="p"><!----></div>
<a 
id="fcmovb21347"></a><a 
id="fcmove21348"></a><a 
id="fcmovbe21349"></a><a 
id="fcmovu21350"></a><a 
id="fcmovnb21351"></a><a 
id="fcmovne21352"></a><a 
id="fcmovnbe21353"></a><a 

id="fcmovnu21354"></a>The instructions obtained by attaching the FPU
condition mnemonic (see table ) to the <tt>fcmov</tt> mnemonic
transfer the specified FPU register into <tt>st0</tt> register if the given test
condition is true. These instructions allow two different syntaxes, one with single
operand specifying the source FPU register, and one with two operands, in that case
destination operand should be <tt>st0</tt> register and the second operand specifies
the source FPU register.

................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb2.2">
</a> 
<div style="text-align:center">
<table border="1">
<tr><td align="center">Mnemonic </td><td align="center">Condition tested </td><td align="center">Description </td></tr><tr><td></td></tr>
<tr><td align="center"><tt>b</tt> </td><td align="center">CF = 1 </td><td align="center">below</td></tr>
<tr><td align="center"><tt>e</tt> </td><td align="center">ZF = 1 </td><td align="center">equal</td></tr>
<tr><td align="center"><tt>be</tt> </td><td align="center">CF <tt>or</tt> ZF = 1 </td><td align="center">below or equal</td></tr>
<tr><td align="center"><tt>u</tt> </td><td align="center">PF = 1 </td><td align="center">unordered</td></tr>
<tr><td align="center"><tt>nb</tt> </td><td align="center">CF = 0 </td><td align="center">not below</td></tr>
<tr><td align="center"><tt>ne</tt> </td><td align="center">ZF = 0 </td><td align="center">not equal</td></tr>
................................................................................
</div>

<div style="text-align:center">Table 2.2: FPU conditions.</div>
<a id="tab:FPU_conditions">
</a>

<div class="p"><!----></div>
<a 
id="ftst21355"></a><a 
id="fxam21356"></a><tt>ftst</tt> compares the value in <tt>st0</tt> with 0.0 and sets the flags in the
FPU status word according to the results. <tt>fxam</tt> examines the contents of the
<tt>st0</tt> and sets the flags in FPU status word to indicate the class of value in
the register. These instructions have no operands.

<div class="p"><!----></div>
<a 
id="fstsw21357"></a><a 
id="fnstsw21358"></a><tt>fstsw</tt> and <tt>fnstsw</tt> store the current value of the FPU status word in the
destination location. The destination operand can be either a 16-bit memory or the
<tt>ax</tt> register. <tt>fstsw</tt> checks for pending umasked FPU exceptions before
storing the status word, <tt>fnstsw</tt> does not.

<div class="p"><!----></div>
<a 
id="fstcw21359"></a><a 
id="fnstcw21360"></a><tt>fstcw</tt> and <tt>fnstcw</tt> store the current value of the FPU control word
at the specified destination in memory. <tt>fstcw</tt> checks for pending unmasked FPU
exceptions before storing the control word, <tt>fnstcw</tt> does not. <tt>fldcw</tt> loads
the operand into the FPU control word. The operand should be a 16-bit memory
location.

<div class="p"><!----></div>
<a 
id="fstenv21361"></a><a 
id="fnstenv21362"></a><a 
id="fldenv21363"></a><a 
id="fsave21364"></a><a 

id="fnsave21365"></a><a 
id="frstor21366"></a><a 
id="fstenvw21367"></a><a 
id="fnstenvw21368"></a><a 
id="fldenvw21369"></a><a 
id="fsavew21370"></a><a 
id="fnsavew21371"></a><a 
id="frstorw21372"></a><a 
id="fstenvd21373"></a><a 
id="fnstenvd21374"></a><a 
id="fldenvd21375"></a><a 
id="fsaved21376"></a><a 
id="fnsaved21377"></a><a 

id="frstord21378"></a><tt>fstenv</tt> and <tt>fnstenv</tt> store the current FPU operating environment at
the memory location specified with the destination operand, and then mask all
FPU exceptions. <tt>fstenv</tt> checks for pending umasked FPU exceptions before
proceeding, <tt>fnstenv</tt> does not. <tt>fldenv</tt> loads the complete operating
environment from memory into the FPU. <tt>fsave</tt> and <tt>fnsave</tt>
store the current FPU state (operating environment and register stack) at the
specified destination in memory and reinitializes the FPU. <tt>fsave</tt> check
for pending unmasked FPU exceptions before proceeding, <tt>fnsave</tt> does not.
................................................................................
exist two additional mnemonics that allow to precisely select the type of the
operation. The <tt>fstenvw</tt>, <tt>fnstenvw</tt>, <tt>fldenvw</tt>, <tt>fsavew</tt>, <tt>fnsavew</tt> and
<tt>frstorw</tt> mnemonics force the instruction to perform operation as in the 16-bit
mode, while <tt>fstenvd</tt>, <tt>fnstenvd</tt>, <tt>fldenvd</tt>, <tt>fsaved</tt>, <tt>fnsaved</tt> and <tt>frstord</tt>
force the operation as in 32-bit mode.

<div class="p"><!----></div>
<a 
id="finit21379"></a><a 
id="fninit21380"></a><a 
id="fclex21381"></a><a 
id="fnclex21382"></a><a 

id="wait21383"></a><a 
id="fwait21384"></a><tt>finit</tt> and <tt>fninit</tt> set the FPU operating environment into its default
state. <tt>finit</tt> checks for pending unmasked FPU exception before proceeding,
<tt>fninit</tt> does not. <tt>fclex</tt> and <tt>fnclex</tt> clear the FPU exception flags in the FPU
status word. <tt>fclex</tt> checks for pending unmasked FPU exception before proceeding,
<tt>fnclex</tt> does not. <tt>wait</tt> and <tt>fwait</tt> are synonyms for the same
instruction, which causes the processor to check for pending unmasked FPU exceptions
and handle them before proceeding. These instructions have no operands.

<div class="p"><!----></div>
<a 
id="ffree21385"></a><tt>ffree</tt> sets the tag associated with specified FPU register to empty. The
operand should be an FPU register.

<div class="p"><!----></div>
<a 
id="fincstp21386"></a><a 

id="fdecstp21387"></a><tt>fincstp</tt> and <tt>fdecstp</tt> rotate the FPU stack by one by adding or
substracting one to the pointer of the top of stack. These instructions have no
operands.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.14"></a><h3>
2.1.14&nbsp;&nbsp;MMX instructions</h3>
<a id="sec:MMX_instructions">
</a>
................................................................................
which are the low 64-bit parts of the 80-bit FPU registers. Because of this MMX
instructions cannot be used at the same time as FPU instructions. They can operate
on packed bytes (eight 8-bit integers), packed words (four 16-bit integers) or
packed double words (two 32-bit integers), use of packed formats allows to perform
operations on multiple data at one time.

<div class="p"><!----></div>
<a 
id="movq21388"></a><tt>movq</tt> copies a quad word from the source operand to the destination operand.
At least one of the operands must be a MMX register, the second one can be also
a MMX register or 64-bit memory location.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;movq&nbsp;mm0,mm1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;move&nbsp;quad&nbsp;word&nbsp;from&nbsp;register&nbsp;to&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;movq&nbsp;mm2,[ebx]&nbsp;&nbsp;&nbsp;;&nbsp;move&nbsp;quad&nbsp;word&nbsp;from&nbsp;memory&nbsp;to&nbsp;register

</pre>

<div class="p"><!----></div>
<a 
id="movd21389"></a><tt>movd</tt> copies a double word from the source operand to the destination operand.
One of the operands must be a MMX register, the second one can be a general register
or 32-bit memory location. Only low double word of MMX register is used.

<div class="p"><!----></div>
All general MMX operations have two operands, the destination operand should be
a MMX register, the source operand can be a MMX register or 64-bit memory location.
Operation is performed on the corresponding data elements of the source and destination
operand and stored in the data elements of the destination operand.
<a 
id="paddb21390"></a><a 
id="paddw21391"></a><a 
id="paddd21392"></a>

<tt>paddb</tt>, <tt>paddw</tt> and <tt>paddd</tt> perform the addition of packed bytes,
packed words, or packed double words.  
<a 
id="psubb21393"></a><a 
id="psubw21394"></a><a 
id="psubd21395"></a>

<tt>psubb</tt>, <tt>psubw</tt> and <tt>psubd</tt> perform the substraction of appropriate types. 
<a 
id="paddsb21396"></a><a 
id="paddsw21397"></a><a 
id="psubsb21398"></a><a 
id="psubsw21399"></a>

<tt>paddsb</tt>, <tt>paddsw</tt>, <tt>psubsb</tt> and <tt>psubsw</tt> perform the addition or 
substraction of packed bytes or packed words with the signed saturation. 
<a 
id="paddusb21400"></a><a 
id="paddusw21401"></a><a 
id="psubusb21402"></a><a 
id="psubusw21403"></a>

<tt>paddusb</tt>, <tt>paddusw</tt>, <tt>psubusb</tt>, <tt>psubusw</tt> are analoguous, but with 
unsigned saturation.
<a 
id="pmulhw21404"></a><a 
id="pmullw21405"></a>&nbsp;<tt>pmulhw</tt> and <tt>pmullw</tt> performs a signed multiplication of the packed words
and store the high or low words of the results in the destination operand.
<a 
id="pmaddwd21406"></a>

<tt>pmaddwd</tt> performs a multiply of the packed words and adds the four intermediate
double word products in pairs to produce result as a packed double words.
<a 
id="pand21407"></a><a 
id="por21408"></a><a 
id="pxor21409"></a><a 
id="pandn21410"></a>

<tt>pand</tt>, <tt>por</tt> and <tt>pxor</tt> perform the logical operations on the quad words,
<tt>pandn</tt> peforms also a logical negation of the destination operand before the
operation.
<a 
id="pcmpeqb21411"></a><a 
id="pcmpeqw21412"></a><a 
id="pcmpeqd21413"></a>

<tt>pcmpeqb</tt>, <tt>pcmpeqw</tt> and <tt>pcmpeqd</tt> compare for equality of packed
bytes, packed words or packed double words. If a pair of data elements is equal,
the corresponding data element in the destination operand is filled with bits of
value 1, otherwise it's set to 0. 
<a 
id="pcmpgtb21414"></a><a 
id="pcmpgtw21415"></a><a 
id="pcmpgtd21416"></a>

<tt>pcmpgtb</tt>, <tt>pcmpgtw</tt> and <tt>pcmpgtd</tt>
perform the similar operation, but they check whether the data elements in
the destination operand are greater than the correspoding data elements in the
source operand.
<a 
id="packsswb21417"></a><a 
id="packssdw21418"></a><a 
id="packuswb21419"></a>

<tt>packsswb</tt> converts packed signed words into packed signed bytes, <tt>packssdw</tt>
converts packed signed double words into packed signed words, using saturation to
handle overflow conditions. <tt>packuswb</tt> converts packed signed words into
packed unsigned bytes. Converted data elements from the source operand are stored
in the low part of the destination operand, while converted data elements from
the destination operand are stored in the high part.
<a 
id="punpckhbw21420"></a><a 
id="punpckhwd21421"></a><a 
id="punpckhdq21422"></a>

<tt>punpckhbw</tt>, <tt>punpckhwd</tt> and <tt>punpckhdq</tt> interleaves the
data elements from the high parts of the source and destination operands and stores
the result into the destination operand. 
<a 
id="punpcklbw21423"></a><a 
id="punpcklwd21424"></a><a 
id="punpckldq21425"></a>

<tt>punpcklbw</tt>, <tt>punpcklwd</tt> and
<tt>punpckldq</tt> perform the same operation, but the low parts of the source and destination
operand are used.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;paddsb&nbsp;mm0,[esi]&nbsp;;&nbsp;add&nbsp;packed&nbsp;bytes&nbsp;with&nbsp;signed&nbsp;saturation
&nbsp;&nbsp;&nbsp;&nbsp;pcmpeqw&nbsp;mm3,mm7&nbsp;&nbsp;;&nbsp;compare&nbsp;packed&nbsp;words&nbsp;for&nbsp;equality

</pre>

<div class="p"><!----></div>
<a 
id="psllw21426"></a><a 
id="pslld21427"></a><a 
id="psllq21428"></a><tt>psllw</tt>, <tt>pslld</tt> and <tt>psllq</tt> perform logical shift left of the packed
words, packed double words or a single quad word in the destination operand by the
amount specified in the source operand. 
<a 
id="psrlw21429"></a><a 
id="psrld21430"></a><a 
id="psrlq21431"></a>

<tt>psrlw</tt>, <tt>psrld</tt> and <tt>psrlq</tt> perform logical shift right of the packed words, 
packed double words or a single quad word. 
<a 
id="psraw21432"></a><a 
id="psrad21433"></a>

<tt>psraw</tt> and <tt>psrad</tt> perform arithmetic shift of the packed words or
double words. The destination operand should be a MMX register, while source operand
can be a MMX register, 64-bit memory location, or 8-bit immediate value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;psllw&nbsp;mm2,mm4&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;shift&nbsp;words&nbsp;left&nbsp;logically
&nbsp;&nbsp;&nbsp;&nbsp;psrad&nbsp;mm4,[ebx]&nbsp;&nbsp;;&nbsp;shift&nbsp;double&nbsp;words&nbsp;right&nbsp;arithmetically

</pre>

<div class="p"><!----></div>
<a 
id="emms21434"></a><tt>emms</tt> makes the FPU registers usable for the FPU instructions, it must be used
before using the FPU instructions if any MMX instructions were used.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.15"></a><h3>
2.1.15&nbsp;&nbsp;SSE instructions</h3>
The SSE extension adds more MMX instructions and also introduces the
operations on packed single precision floating point values. The 128-bit
packed single precision format consists of four single precision floating
point values. The 128-bit SSE registers are designed for the purpose of
operations on this data type.

<div class="p"><!----></div>
<a 
id="movaps21435"></a><a 
id="movups21436"></a><tt>movaps</tt> and <tt>movups</tt> transfer a double quad word operand containing packed
single precision values from source operand to destination operand. At least
one of the operands have to be a SSE register, the second one can be also a
SSE register or 128-bit memory location. Memory operands for <tt>movaps</tt>
instruction must be aligned on boundary of 16 bytes, operands for <tt>movups</tt>
instruction don't have to be aligned.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;movups&nbsp;xmm0,[ebx]&nbsp;&nbsp;;&nbsp;move&nbsp;unaligned&nbsp;double&nbsp;quad&nbsp;word

</pre>

<div class="p"><!----></div>
<a 
id="movlps21437"></a><a 
id="movhps21438"></a><tt>movlps</tt> moves packed two single precision values between the memory and the
low quad word of SSE register. <tt>movhps</tt> moved packed two single precision
values between the memory and the high quad word of SSE register. One of the
operands must be a SSE register, and the other operand must be a 64-bit memory
location.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;movlps&nbsp;xmm0,[ebx]&nbsp;&nbsp;;&nbsp;move&nbsp;memory&nbsp;to&nbsp;low&nbsp;quad&nbsp;word&nbsp;of&nbsp;xmm0
&nbsp;&nbsp;&nbsp;&nbsp;movhps&nbsp;[esi],xmm7&nbsp;&nbsp;;&nbsp;move&nbsp;high&nbsp;quad&nbsp;word&nbsp;of&nbsp;xmm7&nbsp;to&nbsp;memory

</pre>

<div class="p"><!----></div>
<a 
id="movlhps21439"></a><a 
id="movhlps21440"></a><tt>movlhps</tt> moves packed two single precision values from the low quad word
of source register to the high quad word of destination register. <tt>movhlps</tt>
moves two packed single precision values from the high quad word of source
register to the low quad word of destination register. Both operands have to
be a SSE registers.

<div class="p"><!----></div>
<a 
id="movmskps21441"></a><tt>movmskps</tt> transfers the most significant bit of each of the four single
precision values in the SSE register into low four bits of a general register.
The source operand must be a SSE register, the destination operand must be a
general register.

<div class="p"><!----></div>
<a 
id="movss21442"></a><tt>movss</tt> transfers a single precision value between source and destination
operand (only the low double word is trasferred). At least one of the operands
have to be a SSE register, the second one can be also a SSE register or 32-bit
memory location.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;movss&nbsp;[edi],xmm3&nbsp;&nbsp;&nbsp;;&nbsp;move&nbsp;low&nbsp;double&nbsp;word&nbsp;of&nbsp;xmm3&nbsp;to&nbsp;memory

................................................................................
destination register. When the mnemonic ends with <tt>ss</tt>, the source operand
can be a 32-bit memory location or a SSE register, the destination operand
must be a SSE register and the operation is performed on single precision
values, only low double words of SSE registers are used in this case, the
result is stored in the low double word of destination register. 

<div class="p"><!----></div>
<a 
id="addps21443"></a><a 
id="addss21444"></a><a 
id="subps21445"></a><a 
id="subss21446"></a><a 
id="mulps21447"></a><a 
id="mulss21448"></a><a 
id="divps21449"></a><a 
id="divss21450"></a><a 
id="rcpps21451"></a><a 
id="rcpss21452"></a><a 
id="sqrtps21453"></a><a 
id="sqrtss21454"></a><a 
id="rsqrtps21455"></a><a 
id="rsqrtss21456"></a><a 

id="maxps21457"></a><a 
id="maxss21458"></a><a 
id="minps21459"></a><a 
id="minss21460"></a><tt>addps</tt> and <tt>addss</tt> add the values, <tt>subps</tt> and <tt>subss</tt> substract the 
source value from destination value, <tt>mulps</tt> and <tt>mulss</tt> multiply the values, 
<tt>divps</tt> and <tt>divss</tt> divide the destination value by the source value, 
<tt>rcpps</tt> and <tt>rcpss</tt> compute the approximate reciprocal of the source value, 
<tt>sqrtps</tt> and <tt>sqrtss</tt> compute the square root of the source value, 
<tt>rsqrtps</tt> and <tt>rsqrtss</tt> compute the approximate reciprocal of square root 
of the source value, <tt>maxps</tt> and <tt>maxss</tt> compare the source and destination 
values and return the greater one, <tt>minps</tt> and <tt>minss</tt> compare the source and 
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;mulss&nbsp;xmm0,[ebx]&nbsp;&nbsp;&nbsp;;&nbsp;multiply&nbsp;single&nbsp;precision&nbsp;values
&nbsp;&nbsp;&nbsp;&nbsp;addps&nbsp;xmm3,xmm7&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;packed&nbsp;single&nbsp;precision&nbsp;values

</pre>

<div class="p"><!----></div>
<a 
id="andps21461"></a><a 
id="andnps21462"></a><a 
id="orps21463"></a><a 
id="xorps21464"></a><tt>andps</tt>, <tt>andnps</tt>, <tt>orps</tt> and <tt>xorps</tt> perform the logical operations on
packed single precision values. The source operand can be a 128-bit memory
location or a SSE register, the destination operand must be a SSE register.

<div class="p"><!----></div>
<a 
id="cmpps21465"></a><a 
id="cmpss21466"></a><a 
id="cmpeqps21467"></a><a 
id="cmpeqss21468"></a><a 
id="cmpltps21469"></a><a 
id="cmpltss21470"></a><a 
id="cmpleps21471"></a><a 
id="cmpless21472"></a><a 
id="cmpunordps21473"></a><a 
id="cmpunordss21474"></a><a 
id="cmpneqps21475"></a><a 
id="cmpneqss21476"></a><a 
id="cmpnltps21477"></a><a 
id="cmpnltss21478"></a><a 
id="cmpnleps21479"></a><a 
id="cmpnless21480"></a><a 
id="cmpordps21481"></a><a 

id="cmpordss21482"></a><tt>cmpps</tt> compares packed single precision values and returns a mask result
into the destination operand, which must be a SSE register. The source operand
can be a 128-bit memory location or SSE register, the third operand must be an
immediate operand selecting code of one of the eight compare conditions
(table ). <tt>cmpss</tt> performs the same operation on single precision values,
only low double word of destination register is affected, in this case source
operand can be a 32-bit memory location or SSE register. These two
instructions have also variants with only two operands and the condition
................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb2.3">
</a> 
<div style="text-align:center">
<table border="1">
<tr><td align="center">Code </td><td align="center">Mnemonic </td><td align="center">Description </td></tr><tr><td></td></tr>
<tr><td align="center">0 </td><td align="center"><tt>eq</tt> </td><td align="center">equal </td></tr>
<tr><td align="center">1 </td><td align="center"><tt>lt</tt> </td><td align="center">less than </td></tr>
<tr><td align="center">2 </td><td align="center"><tt>le</tt> </td><td align="center">less than or equal </td></tr>
<tr><td align="center">3 </td><td align="center"><tt>unord</tt> </td><td align="center">unordered </td></tr>
<tr><td align="center">4 </td><td align="center"><tt>neq</tt> </td><td align="center">not equal </td></tr>
<tr><td align="center">5 </td><td align="center"><tt>nlt</tt> </td><td align="center">not less than </td></tr>
................................................................................
</div>

<div style="text-align:center">Table 2.3: SSE conditions.</div>
<a id="tab:SSE_conditions">
</a>

<div class="p"><!----></div>
<a 
id="comiss21483"></a><a 
id="ucomiss21484"></a><tt>comiss</tt> and <tt>ucomiss</tt> compare the single precision values and set the ZF,
PF and CF flags to show the result. The destination operand must be a SSE
register, the source operand can be a 32-bit memory location or SSE register.

<div class="p"><!----></div>
<a 
id="shufps21485"></a><tt>shufps</tt> moves any two of the four single precision values from the
destination operand into the low quad word of the destination operand, and any
two of the four values from the source operand into the high quad word of the
destination operand. The destination operand must be a SSE register, the
source operand can be a 128-bit memory location or SSE register, the third
operand must be an 8-bit immediate value selecting which values will be moved
into the destination operand. Bits 0 and 1 select the value to be moved from
destination operand to the low double word of the result, bits 2 and 3 select
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;shufps&nbsp;xmm0,xmm0,10010011b&nbsp;;&nbsp;shuffle&nbsp;double&nbsp;words

</pre>

<div class="p"><!----></div>
<a 
id="unpckhps21486"></a><a 
id="unpcklps21487"></a><tt>unpckhps</tt> performs an interleaved unpack of the values from the high parts
of the source and destination operands and stores the result in the
destination operand, which must be a SSE register. The source operand can be
a 128-bit memory location or a SSE register. <tt>unpcklps</tt> performs an
interleaved unpack of the values from the low parts of the source and
destination operand and stores the result in the destination operand,
the rules for operands are the same.

<div class="p"><!----></div>
<a 
id="cvtpi2ps21488"></a><tt>cvtpi2ps</tt> converts packed two double word integers into the the packed two
single precision floating point values and stores the result in the low quad
word of the destination operand, which should be a SSE register. The source
operand can be a 64-bit memory location or MMX register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;cvtpi2ps&nbsp;xmm0,mm0&nbsp;&nbsp;;&nbsp;integers&nbsp;to&nbsp;single&nbsp;precision&nbsp;values

</pre>

<div class="p"><!----></div>
<a 
id="cvtsi2ss21489"></a><tt>cvtsi2ss</tt> converts a double word integer into a single precision floating
point value and stores the result in the low double word of the destination
operand, which should be a SSE register. The source operand can be a 32-bit
memory location or 32-bit general register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;cvtsi2ss&nbsp;xmm0,eax&nbsp;&nbsp;;&nbsp;integer&nbsp;to&nbsp;single&nbsp;precision&nbsp;value

</pre>

<div class="p"><!----></div>
<a 
id="cvtps2pi21490"></a><a 
id="cvttps2pi21491"></a><tt>cvtps2pi</tt> converts packed two single precision floating point values into
packed two double word integers and stores the result in the destination
operand, which should be a MMX register. The source operand can be a 64-bit
memory location or SSE register, only low quad word of SSE register is used.
<tt>cvttps2pi</tt> performs the similar operation, except that truncation is used to
round a source values to integers, rules for the operands are the same.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;cvtps2pi&nbsp;mm0,xmm0&nbsp;&nbsp;;&nbsp;single&nbsp;precision&nbsp;values&nbsp;to&nbsp;integers

</pre>

<div class="p"><!----></div>
<a 
id="cvtss2si21492"></a><a 
id="cvttss2si21493"></a><tt>cvtss2si</tt> convert a single precision floating point value into a double
word integer and stores the result in the destination operand, which should be
a 32-bit general register. The source operand can be a 32-bit memory location
or SSE register, only low double word of SSE register is used. <tt>cvttss2si</tt>
performs the similar operation, except that truncation is used to round a
source value to integer, rules for the operands are the same.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;cvtss2si&nbsp;eax,xmm0&nbsp;&nbsp;;&nbsp;single&nbsp;precision&nbsp;value&nbsp;to&nbsp;integer

</pre>

<div class="p"><!----></div>
<a 
id="pextrw21494"></a><tt>pextrw</tt> copies the word in the source operand specified by the third
operand to the destination operand. The source operand must be a MMX register,
the destination operand must be a 32-bit general register (the high word of
the destination is cleared), the third operand must an 8-bit immediate value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;pextrw&nbsp;eax,mm0,1&nbsp;&nbsp;&nbsp;;&nbsp;extract&nbsp;word&nbsp;into&nbsp;eax

</pre>

<div class="p"><!----></div>
<a 
id="pinsrw21495"></a><tt>pinsrw</tt> inserts a word from the source operand in the destination operand
at the location specified with the third operand, which must be an 8-bit
immediate value. The destination operand must be a MMX register, the source
operand can be a 16-bit memory location or 32-bit general register (only low
word of the register is used).

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;pinsrw&nbsp;mm1,ebx,2&nbsp;&nbsp;&nbsp;;&nbsp;insert&nbsp;word&nbsp;from&nbsp;ebx

</pre>

<div class="p"><!----></div>
<a 
id="pavgb21496"></a><a 
id="pavgw21497"></a><a 
id="pmaxub21498"></a><a 
id="pminub21499"></a><a 
id="pmaxsw21500"></a><a 
id="pminsw21501"></a><a 
id="pmulhuw21502"></a><a 

id="psadbw21503"></a><tt>pavgb</tt> and <tt>pavgw</tt> compute average of packed bytes or words. <tt>pmaxub</tt>
return the maximum values of packed unsigned bytes, <tt>pminub</tt> returns the
minimum values of packed unsigned bytes, <tt>pmaxsw</tt> returns the maximum values
of packed signed words, <tt>pminsw</tt> returns the minimum values of packed signed
words. <tt>pmulhuw</tt> performs a unsigned multiplication of the packed words and stores
the high words of the results in the destination operand. <tt>psadbw</tt> computes
the absolute differences of packed unsigned bytes, sums the differences, and
stores the sum in the low word of destination operand. All these instructions
follow the same rules for operands as the general MMX operations described in
previous section.

<div class="p"><!----></div>
<a 
id="pmovmskb21504"></a><tt>pmovmskb</tt> creates a mask made of the most significant bit of each byte in
the source operand and stores the result in the low byte of destination
operand. The source operand must be a MMX register, the destination operand
must a 32-bit general register.

<div class="p"><!----></div>
<a 
id="pshufw21505"></a><tt>pshufw</tt> inserts words from the source operand in the destination operand
from the locations specified with the third operand. The destination operand
must be a MMX register, the source operand can be a 64-bit memory location or
MMX register, third operand must an 8-bit immediate value selecting which
values will be moved into destination operand, in the similar way as the third
operand of the <tt>shufps</tt> instruction.

<div class="p"><!----></div>
<a 
id="movntq21506"></a><a 
id="movntps21507"></a><a 

id="maskmovq21508"></a><tt>movntq</tt> moves the quad word from the source operand to memory using a
non-temporal hint to minimize cache pollution. The source operand should be a
MMX register, the destination operand should be a 64-bit memory location.
<tt>movntps</tt> stores packed single precision values from the SSE register to
memory using a non-temporal hint. The source operand should be a SSE register,
the destination operand should be a 128-bit memory location. <tt>maskmovq</tt> stores
selected bytes from the first operand into a 64-bit memory location using a
non-temporal hint. Both operands should be a MMX registers, the second operand
selects wich bytes from the source operand are written to memory. The
memory location is pointed by DI (or EDI) register in the segment selected
by DS.

<div class="p"><!----></div>
<a 
id="prefetch021509"></a><a 
id="prefetch121510"></a><a 
id="prefetch221511"></a><a 
id="prefetchnta21512"></a><tt>prefetcht0</tt>, <tt>prefetcht1</tt>, <tt>prefetcht2</tt> and <tt>prefetchnta</tt> fetch the line
of data from memory that contains byte specified with the operand to a
specified location in hierarchy.  The operand should be an 8-bit memory
location.

<div class="p"><!----></div>
<a 
id="sfence21513"></a><tt>sfence</tt> performs a serializing operation on all instruction storing to
memory that were issued prior to it. This instruction has no operands.

<div class="p"><!----></div>
<a 
id="ldmxcsr21514"></a><a 
id="stmxcsr21515"></a><tt>ldmxcsr</tt> loads the 32-bit memory operand into the MXCSR register. <tt>stmxcsr</tt>
stores the contents of MXCSR into a 32-bit memory operand.

<div class="p"><!----></div>
<a 
id="fxsave21516"></a><a 
id="fxrstor21517"></a><a 

id="fxsave21518"></a><tt>fxsave</tt> saves the current state of the FPU, MXCSR register, and all the FPU
and SSE registers to a 512-byte memory location specified in the destination
operand. <tt>fxrstor</tt> reloads data previously stored with <tt>fxsave</tt> instruction
from the specified 512-byte memory location. The memory operand for both those
instructions must be aligned on 16 byte boundary, it should declare operand
of no specified size.

<div class="p"><!----></div>
................................................................................
     <a id="tth_sEc2.1.16"></a><h3>
2.1.16&nbsp;&nbsp;SSE2 instructions</h3>
The SSE2 extension introduces the operations on packed double precision
floating point values, extends the syntax of MMX instructions, and adds also
some new instructions.

<div class="p"><!----></div>
<a 
id="movapd21519"></a><a 
id="movupd21520"></a><tt>movapd</tt> and <tt>movupd</tt> transfer a double quad word operand containing packed
double precision values from source operand to destination operand. These
instructions are analogous to <tt>movaps</tt> and <tt>movups</tt> and have the same rules
for operands.

<div class="p"><!----></div>
<a 
id="movlpd21521"></a><a 
id="movhpd21522"></a><tt>movlpd</tt> moves double precision value between the memory and the low quad
word of SSE register. <tt>movhpd</tt> moved double precision value between the memory
and the high quad word of SSE register. These instructions are analogous to
<tt>movlps</tt> and <tt>movhps</tt> and have the same rules for operands.

<div class="p"><!----></div>
<a 
id="movmskpd21523"></a><tt>movmskpd</tt> transfers the most significant bit of each of the two double
precision values in the SSE register into low two bits of a general register.
This instruction is analogous to <tt>movmskps</tt> and has the same rules for
operands.

<div class="p"><!----></div>
<a 
id="movsd21524"></a><tt>movsd</tt> transfers a double precision value between source and destination
operand (only the low quad word is trasferred). At least one of the operands
have to be a SSE register, the second one can be also a SSE register or 64-bit
memory location.

<div class="p"><!----></div>
<a 
id="addpd21525"></a><a 
id="addsd21526"></a><a 
id="subpd21527"></a><a 
id="subsd21528"></a><a 
id="mulpd21529"></a><a 
id="mulsd21530"></a><a 
id="divpd21531"></a><a 
id="divsd21532"></a><a 
id="sqrtpd21533"></a><a 
id="sqrtsd21534"></a><a 

id="maxpd21535"></a><a 
id="maxsd21536"></a><a 
id="minpd21537"></a><a 
id="minsd21538"></a>Arithmetic operations on double precision values are: <tt>addpd</tt>, <tt>addsd</tt>,
<tt>subpd</tt>, <tt>subsd</tt>, <tt>mulpd</tt>, <tt>mulsd</tt>, <tt>divpd</tt>, <tt>divsd</tt>, <tt>sqrtpd</tt>, <tt>sqrtsd</tt>,
<tt>maxpd</tt>, <tt>maxsd</tt>, <tt>minpd</tt>, <tt>minsd</tt>, and they are analoguous to arithmetic
operations on single precision values described in previous section. When the
mnemonic ends with <tt>pd</tt> instead of <tt>ps</tt>, the operation is performed on packed
two double precision values, but rules for operands are the same. When the
mnemonic ends with <tt>sd</tt> instead of <tt>ss</tt>, the source operand can be a 64-bit
memory location or a SSE register, the destination operand must be a SSE
register and the operation is performed on double precision values, only low
quad words of SSE registers are used in this case.

<div class="p"><!----></div>
<a 
id="andpd21539"></a><a 
id="andnpd21540"></a><a 
id="orpd21541"></a><a 
id="xorpd21542"></a><tt>andpd</tt>, <tt>andnpd</tt>, <tt>orpd</tt> and <tt>xorpd</tt> perform the logical operations on
packed double precision values. They are analoguous to SSE logical operations
on single prevision values and have the same rules for operands.

<div class="p"><!----></div>
<a 
id="cmppd21543"></a><a 
id="cmpsd21544"></a><a 
id="cmpeqpd21545"></a><a 
id="cmpeqsd21546"></a><a 
id="cmpltpd21547"></a><a 
id="cmpltsd21548"></a><a 
id="cmplepd21549"></a><a 
id="cmplesd21550"></a><a 
id="cmpunordpd21551"></a><a 
id="cmpunordsd21552"></a><a 
id="cmpneqpd21553"></a><a 
id="cmpneqsd21554"></a><a 
id="cmpnltpd21555"></a><a 
id="cmpnltsd21556"></a><a 
id="cmpnlepd21557"></a><a 
id="cmpnlesd21558"></a><a 
id="cmpordpd21559"></a><a 

id="cmpordsd21560"></a><tt>cmppd</tt> compares packed double precision values and returns and returns a
mask result into the destination operand. This instruction is analoguous to
<tt>cmpps</tt> and has the same rules for operands. <tt>cmpsd</tt> performs the same
operation on double precision values, only low quad word of destination
register is affected, in this case source operand can be a 64-bit memory or
SSE register. Variant with only two operands are obtained by attaching the
condition mnemonic from table <a href="#tab:SSE_conditions">2.3</a> to the <tt>cmp</tt> mnemonic and then attaching
the <tt>pd</tt> or <tt>sd</tt> at the end.

<div class="p"><!----></div>
<a 
id="comisd21561"></a><a 
id="ucomisd21562"></a><tt>comisd</tt> and <tt>ucomisd</tt> compare the double precision values and set the ZF,
PF and CF flags to show the result. The destination operand must be a SSE
register, the source operand can be a 128-bit memory location or SSE register.

<div class="p"><!----></div>
<a 
id="shufpd21563"></a><a 
id="shufps21564"></a><tt>shufpd</tt> moves any of the two double precision values from the destination
operand into the low quad word of the destination operand, and any of the two
values from the source operand into the high quad word of the destination
operand. This instruction is analoguous to <tt>shufps</tt> and has the same rules for
operand. Bit 0 of the third operand selects the value to be moved from the
destination operand, bit 1 selects the value to be moved from the source
operand, the rest of bits are reserved and must be zeroed.

<div class="p"><!----></div>
<a 
id="unpckhpd21565"></a><a 
id="unpcklpd21566"></a><tt>unpckhpd</tt> performs an unpack of the high quad words from the source and
destination operands, <tt>unpcklpd</tt> performs an unpack of the low quad words from
the source and destination operands. They are analoguous to <tt>unpckhps</tt> and
<tt>unpcklps</tt>, and have the same rules for operands.

<div class="p"><!----></div>
<a 
id="cvtps2pd21567"></a><a 
id="cvtpd2ps21568"></a><a 
id="cvtss2sd21569"></a><a 

id="cvtsd2ss21570"></a><tt>cvtps2pd</tt> converts the packed two single precision floating point values to
two packed double precision floating point values, the destination operand
must be a SSE register, the source operand can be a 64-bit memory location or
SSE register. <tt>cvtpd2ps</tt> converts the packed two double precision floating
point values to packed two single precision floating point values, the
destination operand must be a SSE register, the source operand can be a
128-bit memory location or SSE register. <tt>cvtss2sd</tt> converts the single
precision floating point value to double precision floating point value, the
................................................................................
destination operand must be a SSE register, the source operand can be a 32-bit
memory location or SSE register. <tt>cvtsd2ss</tt> converts the double precision
floating point value to single precision floating point value, the destination
operand must be a SSE register, the source operand can be 64-bit memory
location or SSE register.

<div class="p"><!----></div>
<a 
id="cvtpi2pd21571"></a><a 
id="cvtsi2sd21572"></a><a 
id="cvtpd2pi21573"></a><a 
id="cvttpd2pi21574"></a><a 
id="cvtsd2si21575"></a><a 

id="cvttsd2si21576"></a><tt>cvtpi2pd</tt> converts packed two double word integers into the the packed
double precision floating point values, the destination operand must be a SSE
register, the source operand can be a 64-bit memory location or MMX register.
<tt>cvtsi2sd</tt> converts a double word integer into a double precision floating
point value, the destination operand must be a SSE register, the source
operand can be a 32-bit memory location or 32-bit general register. <tt>cvtpd2pi</tt>
converts packed double precision floating point values into packed two double
word integers, the destination operand should be a MMX register, the source
................................................................................
precision floating point value into a double word integer, the destination
operand should be a 32-bit general register, the source operand can be a
64-bit memory location or SSE register. <tt>cvttsd2si</tt> performs the similar
operation, except that truncation is used to round a source value to integer,
rules for operands are the same.

<div class="p"><!----></div>
<a 
id="cvtps2dq21577"></a><a 
id="cvttps2dq21578"></a><a 
id="cvtpd2dq21579"></a><a 
id="cvttpd2dq21580"></a><a 

id="cvtdq2ps21581"></a><tt>cvtps2dq</tt> and <tt>cvttps2dq</tt> convert packed single precision floating point
values to packed four double word integers, storing them in the destination
operand. <tt>cvtpd2dq</tt> and <tt>cvttpd2dq</tt> convert packed double precision floating
point values to packed two double word integers, storing the result in the low
quad word of the destination operand. <tt>cvtdq2ps</tt> converts packed four double 
word integers to packed single precision floating point values. 

<div class="p"><!----></div>
For all these instructions destination operand must be a SSE register, the
source operand can be a 128-bit memory location or SSE register.

<div class="p"><!----></div>
<a 
id="cvtdq2pd21582"></a><tt>cvtdq2pd</tt> converts packed two double word integers from the low quad word
of the source operand to packed double precision floating point values, the source can be a 64-bit
memory location or SSE register, destination has to be SSE register.

<div class="p"><!----></div>
<a 
id="movdqa21583"></a><a 
id="movdqu21584"></a><tt>movdqa</tt> and <tt>movdqu</tt> transfer a double quad word operand containing packed
integers from source operand to destination operand. At least one of the
operands have to be a SSE register, the second one can be also a SSE register
or 128-bit memory location. Memory operands for <tt>movdqa</tt> instruction must be
aligned on boundary of 16 bytes, operands for <tt>movdqu</tt> instruction don't have
to be aligned.

<div class="p"><!----></div>
<a 
id="movq2dq21585"></a><a 
id="movdq2q21586"></a><tt>movq2dq</tt> moves the contents of the MMX source register to the low quad word
of destination SSE register. <tt>movdq2q</tt> moves the low quad word from the source
SSE register to the destination MMX register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;movq2dq&nbsp;xmm0,mm1&nbsp;&nbsp;&nbsp;;&nbsp;move&nbsp;from&nbsp;MMX&nbsp;register&nbsp;to&nbsp;SSE&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;movdq2q&nbsp;mm0,xmm1&nbsp;&nbsp;&nbsp;;&nbsp;move&nbsp;from&nbsp;SSE&nbsp;register&nbsp;to&nbsp;MMX&nbsp;register

</pre>

<div class="p"><!----></div>
<a 
id="pshufhw21587"></a><a 
id="pshuflw21588"></a><a 

id="pshufd21589"></a>All MMX instructions operating on the 64-bit packed integers (those with
mnemonics starting with <tt>p</tt>) are extended to operate on 128-bit packed
integers located in SSE registers. Additional syntax for these instructions
needs an SSE register where MMX register was needed, and the 128-bit memory
location or SSE register where 64-bit memory location or MMX register were
needed. The exception is <tt>pshufw</tt> instruction, which doesn't allow extended
syntax, but has two new variants: <tt>pshufhw</tt> and <tt>pshuflw</tt>, which allow only
the extended syntax, and perform the same operation as <tt>pshufw</tt> on the high
or low quad words of operands respectively. Also the new instruction <tt>pshufd</tt>
is introduced, which performs the same operation as <tt>pshufw</tt>, but on the
double words instead of words, it allows only the extended syntax.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;psubb&nbsp;xmm0,[esi]&nbsp;&nbsp;&nbsp;;&nbsp;substract&nbsp;16&nbsp;packed&nbsp;bytes
&nbsp;&nbsp;&nbsp;&nbsp;pextrw&nbsp;eax,xmm0,7&nbsp;&nbsp;;&nbsp;extract&nbsp;highest&nbsp;word&nbsp;into&nbsp;eax

</pre>

<div class="p"><!----></div>
<a 
id="paddq21590"></a><a 
id="psubq21591"></a><a 
id="pmuludq21592"></a><tt>paddq</tt> performs the addition of packed quad words, <tt>psubq</tt> performs the
substraction of packed quad words, <tt>pmuludq</tt> performs an unsigned multiplication
of low double words from each corresponding quad words and returns the results
in packed quad words. These instructions follow the same rules for operands as
the general MMX operations described in <a href="#sec:MMX_instructions">2.1.14</a>.

<div class="p"><!----></div>
<a 
id="pslldq21593"></a><a 
id="psrldq21594"></a><tt>pslldq</tt> and <tt>psrldq</tt> perform logical shift left or right of the double
quad word in the destination operand by the amount of bytes specified in the source
operand. The destination operand should be a SSE register, source operand
should be an 8-bit immediate value.

<div class="p"><!----></div>
<a 
id="punpckhqdq21595"></a><a 
id="punpcklqdq21596"></a><tt>punpckhqdq</tt> interleaves the high quad word of the source operand and the
high quad word of the destination operand and writes them to the destination
SSE register. <tt>punpcklqdq</tt> interleaves the low quad word of the source operand
and the low quad word of the destination operand and writes them to the
destination SSE register. The source operand can be a 128-bit memory location
or SSE register.

<div class="p"><!----></div>
<a 
id="movntdq21597"></a><a 
id="movntpd21598"></a><a 
id="movnti21599"></a><a 

id="maskmovdqu21600"></a><tt>movntdq</tt> stores packed integer data from the SSE register to memory using
non-temporal hint. The source operand should be a SSE register, the
destination operand should be a 128-bit memory location. <tt>movntpd</tt> stores
packed double precision values from the SSE register to memory using a
non-temporal hint. Rules for operand are the same. <tt>movnti</tt> stores integer
from a general register to memory using a non-temporal hint. The source
operand should be a 32-bit general register, the destination operand should
be a 32-bit memory location. <tt>maskmovdqu</tt> stores selected bytes from the first
................................................................................
operand into a 128-bit memory location using a non-temporal hint. Both
operands should be a SSE registers, the second operand selects wich bytes from
the source operand are written to memory. The memory location is pointed by DI
(or EDI) register in the segment selected by DS and does not need to be
aligned.

<div class="p"><!----></div>
<a 
id="clflush21601"></a><tt>clflush</tt> writes and invalidates the cache line associated with the address
of byte specified with the operand, which should be a 8-bit memory location.

<div class="p"><!----></div>
<a 
id="lfence21602"></a><a 
id="mfence21603"></a><a 
id="sfence21604"></a><a 

id="lfence21605"></a><tt>lfence</tt> performs a serializing operation on all instruction loading from
memory that were issued prior to it. <tt>mfence</tt> performs a serializing operation
on all instruction accesing memory that were issued prior to it, and so it
combines the functions of <tt>sfence</tt> (described in previous section) and
<tt>lfence</tt> instructions. These instructions have no operands.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.17"></a><h3>
2.1.17&nbsp;&nbsp;SSE3 instructions</h3>
Prescott technology introduced some new instructions to improve
the performance of SSE and SSE2 - this extension is called SSE3.

<div class="p"><!----></div>
<a 
id="fisttp21606"></a><tt>fisttp</tt> behaves like the <tt>fistp</tt> instruction and accepts the same operands,
the only difference is that it always used truncation, irrespective of the
rounding mode.

<div class="p"><!----></div>
<a 
id="movshdup21607"></a><tt>movshdup</tt> loads into destination operand the 128-bit value obtained from
the source value of the same size by filling the each quad word with the two
duplicates of the value in its high double word.

<div class="p"><!----></div>
<a 
id="movsldup21608"></a><tt>movsldup</tt> performs the same action, except it duplicates the values of low double words.
The destination operand should be SSE register, the source operand can be SSE register or
128-bit memory location.

<div class="p"><!----></div>
<a 
id="movddup21609"></a><tt>movddup</tt> loads the 64-bit source value and duplicates it into high and low
quad word of the destination operand. The destination operand should be SSE
register, the source operand can be SSE register or 64-bit memory location.

<div class="p"><!----></div>
<a 
id="lddqu21610"></a><tt>lddqu</tt> is functionally equivalent to <tt>movdqu</tt> with memory as
source operand, but it may improve performance when the source operand crosses
a cacheline boundary. The destination operand has to be SSE register, the source
operand must be 128-bit memory location.

<div class="p"><!----></div>
<a 
id="adddubps21611"></a><tt>addsubps</tt> performs single precision addition of second and fourth pairs and
single precision substracion of the first and third pairs of floating point
values in the operands.

<div class="p"><!----></div>
<a 
id="addsubpd21612"></a><tt>addsubpd</tt> performs double precision addition of the
second pair and double precision substraction of the first pair of floating
point values in the operand.

<div class="p"><!----></div>
<a 
id="haddps21613"></a><tt>haddps</tt> performs the addition of two single
precision values within the each quad word of source and destination operands,
and stores the results of such horizontal addition of values from destination
operand into low quad word of destination operand, and the results from the
source operand into high quad word of destination operand.

<div class="p"><!----></div>
<a 
id="haddpd21614"></a><tt>haddpd</tt> performs
the addition of two double precision values within each operand, and stores
the result from destination operand into low quad word of destination operand,
and the result from source operand into high quad word of destination operand.
All these instructions need the destination operand to be SSE register, source
operand can be SSE register or 128-bit memory location.

<div class="p"><!----></div>
<a 
id="monitor21615"></a><tt>monitor</tt> sets up an address range for monitoring of write-back stores. It
need its three operands to be EAX, ECX and EDX register in that order.

<div class="p"><!----></div>
<a 
id="mwait21616"></a><tt>mwait</tt> waits for a write-back store to the address range set up by the
<tt>monitor</tt> instruction.
It uses two operands with additional parameters, first being the EAX and second
the ECX register.

<div class="p"><!----></div>
The functionality of SSE3 is further extended by the set of Supplemental
SSE3 instructions (SSSE3). They generally follow the same rules for operands
as all the MMX operations extended by SSE.

<div class="p"><!----></div>
<a 
id="phaddw21617"></a><a 
id="phaddd21618"></a><a 
id="phaddsw21619"></a><a 
id="phsubw21620"></a><a 
id="phsubd21621"></a><a 

id="phsubsw21622"></a><tt>phaddw</tt> and <tt>phaddd</tt> perform the horizontal additional of the pairs of
adjacent values from both the source and destination operand, and stores the
sums into the destination (sums from the source operand go into lower part of
destination register). They operate on 16-bit or 32-bit chunks, respectively.
<tt>phaddsw</tt> performs the same operation on signed 16-bit packed values, but the
result of each addition is saturated. <tt>phsubw</tt> and <tt>phsubd</tt> analogously
perform the horizontal substraction of 16-bit or 32-bit packed value, and
<tt>phsubsw</tt> performs the horizontal substraction of signed 16-bit packed values
with saturation.

<div class="p"><!----></div>
<a 
id="pabsb21623"></a><a 
id="pabsw21624"></a><a 
id="pabsd21625"></a><tt>pabsb</tt>, <tt>pabsw</tt> and <tt>pabsd</tt> calculate the absolute value of each signed
packed signed value in source operand and stores them into the destination
register. They operator on 8-bit, 16-bit and 32-bit elements respectively.

<div class="p"><!----></div>
<a 
id="pmaddubsw21626"></a><tt>pmaddubsw</tt> multiplies signed 8-bit values from the source operand with the
corresponding unsigned 8-bit values from the destination operand to produce
intermediate 16-bit values, and every adjacent pair of those intermediate
values is then added horizontally and those 16-bit sums are stored into the
destination operand.

<div class="p"><!----></div>
<a 
id="pmulhrsw21627"></a><tt>pmulhrsw</tt> multiplies corresponding 16-bit integers from the source and
destination operand to produce intermediate 32-bit values, and the 16 bits
next to the highest bit of each of those values are then rounded and packed
into the destination operand.

<div class="p"><!----></div>
<a 
id="pshufb21628"></a><tt>pshufb</tt> shuffles the bytes in the destination operand according to the
mask provided by source operand - each of the bytes in source operand is
an index of the target position for the corresponding byte in the destination.

<div class="p"><!----></div>
<a 
id="psignb21629"></a><a 
id="psignw21630"></a><a 
id="psignd21631"></a><tt>psignb</tt>, <tt>psignw</tt> and <tt>psignd</tt> perform the operation on 8-bit, 16-bit or
32-bit integers in destination operand, depending on the signs of the values
in the source. If the value in source is negative, the corresponding value in
the destination register is negated, if the value in source is positive, no
operation is performed on the corresponding value is performed, and if the
value in source is zero, the value in destination is zeroed, too.

<div class="p"><!----></div>
<a 
id="palifnr21632"></a><tt>palignr</tt> appends the source operand to the destination operand to form the
intermediate value of twice the size, and then extracts into the destination
register the 64 or 128 bits that are right-aligned to the byte offset
specified by the third operand, which should be an 8-bit immediate value. This
is the only SSSE3 instruction that takes three arguments.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.18"></a><h3>
2.1.18&nbsp;&nbsp;AMD 3DNow! instructions</h3>
The 3DNow! extension adds a new MMX instructions to those described in <a href="#sec:MMX_instructions">2.1.14</a>,
and introduces operation on the 64-bit packed floating point values, each
consisting of two single precision floating point values.

<div class="p"><!----></div>
<a 
id="pavgusb21633"></a><a 
id="pmulhrw21634"></a><a 

id="pi2fd21635"></a><a 
id="pf2id21636"></a><a 
id="pi2fw21637"></a><a 
id="pf2iw21638"></a><a 
id="pfadd21639"></a><a 
id="pfsub21640"></a><a 
id="pfsubr21641"></a><a 
id="pfmul21642"></a><a 
id="pfacc21643"></a><a 
id="pfnacc21644"></a><a 
id="pfpnacc21645"></a><a 

id="pfmax21646"></a><a 
id="pfmin21647"></a><a 
id="pswapd21648"></a><a 
id="pfrcp21649"></a><a 
id="pfrsqrt21650"></a><a 
id="pfrcpit121651"></a><a 
id="pfrsqit121652"></a><a 
id="pfrcpit221653"></a><a 
id="pfcmpeq21654"></a><a 
id="pfcmpge21655"></a><a 

id="pfcmpgt21656"></a>These instructions follow the same rules as the general MMX operations, the
destination operand should be a MMX register, the source operand can be a MMX
register or 64-bit memory location. 
<tt>pavgusb</tt> computes the rounded averages
of packed unsigned bytes. <tt>pmulhrw</tt> performs a signed multiplication of the packed
words, round the high word of each double word results and stores them in the
destination operand. <tt>pi2fd</tt> converts packed double word integers into
packed floating point values. <tt>pf2id</tt> converts packed floating point values
into packed double word integers using truncation. <tt>pi2fw</tt> converts packed
word integers into packed floating point values, only low words of each
double word in source operand are used. <tt>pf2iw</tt> converts packed floating
point values to packed word integers, results are extended to double words
using the sign extension. <tt>pfadd</tt> adds packed floating point values. <tt>pfsub</tt>
and <tt>pfsubr</tt> substracts packed floating point values, the first one substracts
source values from destination values, the second one substracts destination
values from the source values. <tt>pfmul</tt> multiplies packed floating point
values. <tt>pfacc</tt> adds the low and high floating point values of the destination
operand, storing the result in the low double word of destination, and adds
the low and high floating point values of the source operand, storing the
result in the high double word of destination. <tt>pfnacc</tt> substracts the high
floating point value of the destination operand from the low, storing the
result in the low double word of destination, and substracts the high floating
point value of the source operand from the low, storing the result in the high
double word of destination. <tt>pfpnacc</tt> substracts the high floating point value
of the destination operand from the low, storing the result in the low double
word of destination, and adds the low and high floating point values of the
source operand, storing the result in the high double word of destination.
<tt>pfmax</tt> and <tt>pfmin</tt> compute the maximum and minimum of floating point values.
<tt>pswapd</tt> reverses the high and low double word of the source operand. <tt>pfrcp</tt>
returns an estimates of the reciprocals of floating point values from the
source operand, <tt>pfrsqrt</tt> returns an estimates of the reciprocal square
................................................................................
all bits or zeroes all bits of the correspoding data element in the
destination operand according to the result of comparison, first checks
whether values are equal, second checks whether destination value is greater
or equal to source value, third checks whether destination value is greater
than source value.

<div class="p"><!----></div>
<a 
id="prefetch21657"></a><a 
id="prefetchw21658"></a><tt>prefetch</tt> and <tt>prefetchw</tt> load the line of data from memory that contains
byte specified with the operand into the data cache, <tt>prefetchw</tt> instruction
should be used when the data in the cache line is expected to be modified,
otherwise the <tt>prefetch</tt> instruction should be used. The operand should be an
8-bit memory location.

<div class="p"><!----></div>
<a 
id="femms21659"></a><tt>femms</tt> performs a fast clear of MMX state. It has no operands.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.19"></a><h3>
2.1.19&nbsp;&nbsp;The x86-64 long mode instructions</h3>

<div class="p"><!----></div>
The AMD64 and EM64T architectures (we will use the common name x86-64 for them
................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb2.4">
</a> 
<div style="text-align:center">
<table border="1">
<tr><td align="center">Type </td><td colspan="4" align="center">General </td><td align="center">SSE </td><td align="center">AVX </td></tr>
<tr><td align="center">Bits </td><td align="center">8 </td><td align="center">16 </td><td align="center">32 </td><td align="center">64 </td><td align="center">128 </td><td align="center">256 </td></tr><tr><td></td></tr>
<tr><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"><tt>rax</tt> </td><td align="center"></td><td align="center"></td></tr>
<tr><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"><tt>rcx</tt> </td><td align="center"></td><td align="center"></td></tr>
<tr><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"><tt>rdx</tt> </td><td align="center"></td><td align="center"></td></tr>
<tr><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"><tt>rbx</tt> </td><td align="center"></td><td align="center"></td></tr>
<tr><td align="center"></td><td align="center"><tt>spl</tt> </td><td align="center"></td><td align="center"></td><td align="center"><tt>rsp</tt> </td><td align="center"></td><td align="center"></td></tr>
................................................................................
<div class="p"><!----></div>
If any operation is performed on the 32-bit general registers in long mode,
the upper 32 bits of the 64-bit registers containing them are filled with
zeros. This is unlike the operations on 16-bit or 8-bit portions of those
registers, which preserve the upper bits.

<div class="p"><!----></div>
<a 
id="cdqe21660"></a><a 
id="cqo21661"></a><a 
id="movsxd21662"></a>Three new type conversion instructions are available. The <tt>cdqe</tt> sign extends
the double word in EAX into quad word and stores the result in RAX register.
<tt>cqo</tt> sign extends the quad word in RAX into double quad word and stores the
extra bits in the RDX register. These instructions have no operands.
<tt>movsxd</tt> sign extends the double word source operand, being either the 32-bit register
or memory, into 64-bit destination operand, which has to be register.
No analogous instruction is needed for the zero extension, since it is done
automatically by any operations on 32-bit registers, as noted in previous
................................................................................
indirect far jumps and calls allow any operands that were allowed by the x86
architecture and also 80-bit memory operand is allowed (though only EM64T seems
to implement such variant), with the first eight bytes defining the offset and
two last bytes specifying the selector. The direct far jumps and calls are not
allowed in long mode.

<div class="p"><!----></div>
<a 
id="movsq21663"></a><a 
id="cmpsq21664"></a><a 
id="scasq21665"></a><a 
id="lodsq21666"></a><a 
id="stosq21667"></a>The I/O instructions, <tt>in</tt>, <tt>out</tt>, <tt>ins</tt> and <tt>outs</tt> are the exceptional
instructions that are not extended to accept quad word operands in long mode.
But all other string operations are, and there are new short forms <tt>movsq</tt>,
<tt>cmpsq</tt>, <tt>scasq</tt>, <tt>lodsq</tt> and <tt>stosq</tt> introduced for the variants of string
operations for 64-bit string elements. The RSI and RDI registers are used by
default to address the string elements.

<div class="p"><!----></div>
................................................................................
implement such variant). The <tt>lds</tt> and <tt>les</tt> are disallowed in long mode.

<div class="p"><!----></div>
The system instructions like <tt>lgdt</tt> which required the 48-bit memory operand,
in long mode require the 80-bit memory operand.

<div class="p"><!----></div>
<a 
id="cmpxchg16b21668"></a>The <tt>cmpxchg16b</tt> is the 64-bit equivalent of <tt>cmpxchg8b</tt> instruction, it uses
the double quad word memory operand and 64-bit registers to perform the analoguous operation.

<div class="p"><!----></div>
<a 
id="fxsave6421669"></a><a 
id="fxrstor6421670"></a>The <tt>fxsave64</tt> and <tt>fxrstor64</tt> are new variants of <tt>fxsave</tt> and <tt>fxrstor</tt>
instructions, available only in long mode, which use a different format of
storage area in order to store some pointers in full 64-bit size.

<div class="p"><!----></div>
<a 
id="swapgs21671"></a><tt>swapgs</tt> is the new instruction, which swaps the contents of GS register and
the KernelGSbase model-specific register (MSR address 0C0000102h).

<div class="p"><!----></div>
<a 
id="syscall21672"></a><a 
id="sysret21673"></a><a 
id="sysexitq21674"></a><a 

id="sysretq21675"></a><tt>syscall</tt> and <tt>sysret</tt> is the pair of new instructions that provide the
functionality similar to <tt>sysenter</tt> and <tt>sysexit</tt> in long mode, where the
latter pair is disallowed. The <tt>sysexitq</tt> and <tt>sysretq</tt> mnemonics provide the
64-bit versions of <tt>sysexit</tt> and <tt>sysret</tt> instructions.

<div class="p"><!----></div>
<a 
id="rdmsrq21676"></a><a 
id="wrmsrq21677"></a>The <tt>rdmsrq</tt> and <tt>wrmsrq</tt> mnemonics are the 64-bit variants of the <tt>rdmsr</tt>
and <tt>wrmsr</tt> instructions.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.20"></a><h3>
2.1.20&nbsp;&nbsp;SSE4 instructions</h3>

<div class="p"><!----></div>
................................................................................
<div class="p"><!----></div>
The SSE4.1 instructions mostly follow the same rules for operands, as
the basic SSE operations, so they require destination operand to be SSE
register and source operand to be 128-bit memory location or SSE register,
and some operations require a third operand, the 8-bit immediate value.

<div class="p"><!----></div>
<a 
id="pmulld21678"></a><a 
id="pmuldq21679"></a><a 
id="pminsb21680"></a><a 
id="pmaxsb21681"></a><a 
id="pminuw21682"></a><a 
id="pmaxuw21683"></a><a 
id="pminud21684"></a><a 
id="pmaxud21685"></a><a 
id="pminsd21686"></a><a 

id="pmaxsd21687"></a><tt>pmulld</tt> performs a signed multiplication of the packed double words and
stores the low double words of the results in the destination operand.
<tt>pmuldq</tt> performs a two signed multiplications of the corresponding double
words in the lower quad words of operands, and stores the results as
packed quad words into the destination register. <tt>pminsb</tt> and <tt>pmaxsb</tt>
return the minimum or maximum values of packed signed bytes, <tt>pminuw</tt> and
<tt>pmaxuw</tt> return the minimum and maximum values of packed unsigned words,
<tt>pminud</tt>, <tt>pmaxud</tt>, <tt>pminsd</tt> and <tt>pmaxsd</tt> return minimum or maximum values
of packed unsigned or signed words. These instructions complement the
instructions computing packed minimum or maximum introduced by SSE.

<div class="p"><!----></div>
<a 
id="ptest21688"></a><a 
id="pcmpeqq21689"></a><tt>ptest</tt> sets the ZF flag to one when the result of bitwise AND of the
both operands is zero, and zeroes the ZF otherwise. It also sets CF flag
to one, when the result of bitwise AND of the destination operand with
the bitwise NOT of the source operand is zero, and zeroes the CF otherwise.
<tt>pcmpeqq</tt> compares packed quad words for equality, and fills the
corresponding elements of destination operand with either ones or zeros,
depending on the result of comparison.

<div class="p"><!----></div>
<a 
id="packusdw21690"></a><tt>packusdw</tt> converts packed signed double words from both the source and
destination operand into the unsigned words using saturation, and stores
the eight resulting word values into the destination register.

<div class="p"><!----></div>
<a 
id="phminposuw21691"></a><tt>phminposuw</tt> finds the minimum unsigned word value in source operand
and places it into the lowest word of destination operand, setting the
remaining upper bits of destination to zero.

<div class="p"><!----></div>
<a 
id="roundps21692"></a><a 
id="roundss21693"></a><a 
id="roundpd21694"></a><a 

id="roundsd21695"></a><tt>roundps</tt>, <tt>roundss</tt>, <tt>roundpd</tt> and <tt>roundsd</tt> perform the rounding of
packed or individual floating point value of single or double precision,
using the rounding mode specified by the third operand.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;roundsd&nbsp;xmm0,xmm1,0011b&nbsp;;&nbsp;round&nbsp;toward&nbsp;zero

</pre>

<div class="p"><!----></div>
<a 
id="dpps21696"></a><a 
id="dppd21697"></a><a 
id="mpsadbw21698"></a><a 

id="roundps21699"></a><tt>dpps</tt> calculates dot product of packed single precision floating point
values, that is it multiplies the corresponding pairs of values from source and
destination operand and then sums the products up. The high four bits of the
8-bit immediate third operand control which products are calculated and taken
to the sum, and the low four bits control, into which elements of destination
the resulting dot product is copied (the other elements are filled with zero).
<tt>dppd</tt> calculates dot product of packed double precision floating point values.
The bits 4 and 5 of third operand control, which products are calculated and
................................................................................
at the position one byte after the position of previous block. The four bytes
from the source stay the same each time. This way eight sums of absolute
differencies are calculated and stored as packed word values into the
destination operand. The instructions described in this paragraph follow the
same rules for operands, as <tt>roundps</tt> instruction.

<div class="p"><!----></div>
<a 
id="blendps21700"></a><a 
id="blendvps21701"></a><a 
id="blendpd21702"></a><a 

id="blendvpd21703"></a><tt>blendps</tt>, <tt>blendvps</tt>, <tt>blendpd</tt> and <tt>blendvpd</tt> conditionally copy the
values from source operand into the destination operand, depending on the bits
of the mask provided by third operand. If a mask bit is set, the corresponding
element of source is copied into the same place in destination, otherwise this
position is destination is left unchanged. The rules for the first two operands
are the same, as for general SSE instructions. <tt>blendps</tt> and <tt>blendpd</tt> need
third operand to be 8-bit immediate, and they operate on single or double
precision values, respectively. <tt>blendvps</tt> and <tt>blendvpd</tt> require third operand
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;blendvps&nbsp;xmm3,xmm7,xmm0&nbsp;;&nbsp;blend&nbsp;according&nbsp;to&nbsp;mask

</pre>

<div class="p"><!----></div>
<a 
id="pblendw21704"></a><a 
id="pblendvb21705"></a><tt>pblendw</tt> conditionally copies word elements from the source operand into the
destination, depending on the bits of mask provided by third operand, which
needs to be 8-bit immediate value. <tt>pblendvb</tt> conditionally copies byte
elements from the source operands into destination, depending on mask defined
by the third operand, which has to be XMM0 register. These instructions follow
the same rules for operands as <tt>blendps</tt> and <tt>blendvps</tt> instructions,
respectively.

<div class="p"><!----></div>
<a 
id="insertps21706"></a><tt>insertps</tt> inserts a single precision floating point value taken from the
position in source operand specified by bits 6-7 of third operand into location
in destination register selected by bits 4-5 of third operand. Additionally,
the low four bits of third operand control, which elements in destination
register will be set to zero. The first two operands follow the same rules as
for the general SSE operation, the third operand should be 8-bit immediate.

<div class="p"><!----></div>
<a 
id="extractps21707"></a><tt>extractps</tt> extracts a single precision floating point value taken from the
location in source operand specified by low two bits of third operand, and
stores it into the destination operand. The destination can be a 32-bit memory
value or general purpose register, the source operand must be SSE register,
and the third operand should be 8-bit immediate value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;extractps&nbsp;edx,xmm3,3&nbsp;;&nbsp;extract&nbsp;the&nbsp;highest&nbsp;value

</pre>

<div class="p"><!----></div>
<a 
id="pinsrb21708"></a><a 
id="pinsrd21709"></a><a 
id="pinsrq21710"></a><tt>pinsrb</tt>, <tt>pinsrd</tt> and <tt>pinsrq</tt> copy a byte, double word or quad word from
the source operand into the location of destination operand determined by the
third operand. The destination operand has to be SSE register, the source
operand can be a memory location of appropriate size, or the 32-bit general
purpose register (but 64-bit general purpose register for <tt>pinsrq</tt>, which is
only available in long mode), and the third operand has to be 8-bit immediate
value. These instructions complement the <tt>pinsrw</tt> instruction operating on SSE
register destination, which was introduced by SSE2.
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;pinsrd&nbsp;xmm4,eax,1&nbsp;;&nbsp;insert&nbsp;double&nbsp;word&nbsp;into&nbsp;second&nbsp;position

</pre>

<div class="p"><!----></div>
<a 
id="pextrb21711"></a><a 
id="pextrw21712"></a><a 
id="pextrd21713"></a><a 
id="pextrq21714"></a><tt>pextrb</tt>, <tt>pextrw</tt>, <tt>pextrd</tt> and <tt>pextrq</tt> copy a byte, word, double word or
quad word from the location in source operand specified by third operand, into
the destination. The source operand should be SSE register, the third operand
should be 8-bit immediate, and the destination operand can be memory location
of appropriate size, or the 32-bit general purpose register (but 64-bit general
purpose register for <tt>pextrq</tt>, which is only available in long mode). The
<tt>pextrw</tt> instruction with SSE register as source was already introduced by
SSE2, but SSE4 extends it to allow memory operand as destination.
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;pextrw&nbsp;[ebx],xmm3,7&nbsp;;&nbsp;extract&nbsp;highest&nbsp;word&nbsp;into&nbsp;memory

</pre>

<div class="p"><!----></div>
<a 
id="pmovsxbw21715"></a><a 
id="pmovzxbw21716"></a><a 
id="pmovsxbd21717"></a><a 
id="pmovzxbd21718"></a><a 
id="pmovsxbq21719"></a><a 
id="pmovzxbq21720"></a><a 
id="pmovsxwd21721"></a><a 
id="pmovzxwd21722"></a><a 
id="pmovsxwq21723"></a><a 
id="pmovzxwq21724"></a><a 
id="pmovsxdq21725"></a><a 

id="pmovzxdq21726"></a><tt>pmovsxbw</tt> and <tt>pmovzxbw</tt> perform sign extension or zero extension of eight
byte values from the source operand into packed word values in destination
operand, which has to be SSE register. The source can be 64-bit memory or SSE
register - when it is register, only its low portion is used. <tt>pmovsxbd</tt> and
<tt>pmovzxbd</tt> perform sign extension or zero extension of the four byte values
from the source operand into packed double word values in destination operand,
the source can be 32-bit memory or SSE register. <tt>pmovsxbq</tt> and <tt>pmovzxbq</tt>
perform sign extension or zero extension of the two byte values from the
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;pmovzxbq&nbsp;xmm0,word&nbsp;[si]&nbsp;&nbsp;;&nbsp;zero-extend&nbsp;bytes&nbsp;to&nbsp;quad&nbsp;words
&nbsp;&nbsp;&nbsp;&nbsp;pmovsxwq&nbsp;xmm0,xmm1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;sign-extend&nbsp;words&nbsp;to&nbsp;quad&nbsp;words

</pre>

<div class="p"><!----></div>
<a 
id="movntdqa21727"></a><tt>movntdqa</tt> loads double quad word from the source operand to the destination
using a non-temporal hint. The destination operand should be SSE register,
and the source operand should be 128-bit memory location.

<div class="p"><!----></div>
The SSE4.2, described below, adds not only some new operations on SSE
registers, but also introduces some completely new instructions operating on
general purpose registers only.

<div class="p"><!----></div>
<a 
id="pcmpistri21728"></a><a 
id="pcmpistrm21729"></a><a 
id="pcmpestri21730"></a><a 

id="pcmpestrm21731"></a><tt>pcmpistri</tt> compares two zero-ended (implicit length) strings provided in
its source and destination operand and generates an index stored to ECX;
<tt>pcmpistrm</tt> performs the same comparison and generates a mask stored to XMM0.
<tt>pcmpestri</tt> compares two strings of explicit lengths, with length provided
in EAX for the destination operand and in EDX for the source operand, and
generates an index stored to ECX; <tt>pcmpestrm</tt> performs the same comparision
and generates a mask stored to XMM0. The source and destination operand follow
the same rules as for general SSE instructions, the third operand should be
8-bit immediate value determining the details of performed operation - refer to
Intel documentation for information on those details.

<div class="p"><!----></div>
<a 
id="pcmpgtq21732"></a><tt>pcmpgtq</tt> compares packed quad words, and fills the corresponding elements of
destination operand with either ones or zeros, depending on whether the value
in destination is greater than the one in source, or not. This instruction
follows the same rules for operands as <tt>pcmpeqq</tt>.

<div class="p"><!----></div>
<a 
id="crc3221733"></a><tt>crc32</tt> accumulates a CRC32 value for the source operand starting with
initial value provided by destination operand, and stores the result in
destination. Unless in long mode, the destination operand should be a 32-bit
general purpose register, and the source operand can be a byte, word, or double
word register or memory location. In long mode the destination operand can
also be a 64-bit general purpose register, and the source operand in such case
can be a byte or quad word register or memory location.

................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;crc32&nbsp;eax,dl&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;accumulate&nbsp;CRC32&nbsp;on&nbsp;byte&nbsp;value
&nbsp;&nbsp;&nbsp;&nbsp;crc32&nbsp;eax,word&nbsp;[ebx]&nbsp;&nbsp;;&nbsp;accumulate&nbsp;CRC32&nbsp;on&nbsp;word&nbsp;value
&nbsp;&nbsp;&nbsp;&nbsp;crc32&nbsp;rax,qword&nbsp;[rbx]&nbsp;;&nbsp;accumulate&nbsp;CRC32&nbsp;on&nbsp;quad&nbsp;word&nbsp;value

</pre>

<div class="p"><!----></div>
<a 
id="popcnt21734"></a><tt>popcnt</tt> calculates the number of bits set in the source operand, which can
be 16-bit, 32-bit, or 64-bit general purpose register or memory location,
and stores this count in the destination operand, which has to be register of
the same size as source operand. The 64-bit variant is available only in long
mode.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;popcnt&nbsp;ecx,eax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;count&nbsp;bits&nbsp;set&nbsp;to&nbsp;1

</pre>

<div class="p"><!----></div>
<a 
id="lzcnt21735"></a>The SSE4a extension, which also includes the <tt>popcnt</tt> instruction introduced
by SSE4.2, at the same time adds the <tt>lzcnt</tt> instruction, which follows the
same syntax, and calculates the count of leading zero bits in source operand
(if the source operand is all zero bits, the total number of bits in source
operand is stored in destination).

<div class="p"><!----></div>
<a 
id="extrq21736"></a><tt>extrq</tt> extract the sequence of bits from the low quad word of SSE register
provided as first operand and stores them at the low end of this register,
filling the remaining bits in the low quad word with zeros. The position of bit
string and its length can either be provided with two 8-bit immediate values
as second and third operand, or by SSE register as second operand (and there
is no third operand in such case), which should contain position value in bits
8-13 and length of bit string in bits 0-5.

................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;extrq&nbsp;xmm0,8,7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;extract&nbsp;8&nbsp;bits&nbsp;from&nbsp;position&nbsp;7
&nbsp;&nbsp;&nbsp;&nbsp;extrq&nbsp;xmm0,xmm5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;extract&nbsp;bits&nbsp;defined&nbsp;by&nbsp;register

</pre>

<div class="p"><!----></div>
<a 
id="insertq21737"></a><tt>insertq</tt> writes the sequence of bits from the low quad word of the source
operand into specified position in low quad word of the destination operand,
leaving the other bits in low quad word of destination intact. The position
where bits should be written and the length of bit string can either be
provided with two 8-bit immediate values as third and fourth operand, or by
the bit fields in source operand (and there are only two operands in such
case), which should contain position value in bits 72-77 and length of bit
string in bits 64-69.
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;insertq&nbsp;xmm1,xmm0,4,2&nbsp;;&nbsp;insert&nbsp;4&nbsp;bits&nbsp;at&nbsp;position&nbsp;2
&nbsp;&nbsp;&nbsp;&nbsp;insertq&nbsp;xmm1,xmm0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;insert&nbsp;bits&nbsp;defined&nbsp;by&nbsp;register

</pre>

<div class="p"><!----></div>
<a 
id="movntss21738"></a><a 
id="movntsd21739"></a><tt>movntss</tt> and <tt>movntsd</tt> store single or double precision floating point
value from the source SSE register into 32-bit or 64-bit destination memory
location respectively, using non-temporal hint.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.21"></a><h3>
2.1.21&nbsp;&nbsp;AVX instructions</h3>

................................................................................
variant has a new syntax with three operands - the destination and two sources.
The destination and first source can be SSE registers, and second source can be
SSE register or memory. If the operation is performed on single pair of values,
the remaining bits of first source SSE register are copied into the the
destination register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vsubss&nbsp;xmm0,xmm2,xmm3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;substract&nbsp;two&nbsp;32-bit&nbsp;floats
&nbsp;&nbsp;&nbsp;&nbsp;vmulsd&nbsp;xmm0,xmm7,qword&nbsp;[esi]&nbsp;&nbsp;;&nbsp;multiply&nbsp;two&nbsp;64-bit&nbsp;floats

</pre>
In case of packed operations, each instruction can also operate on the 256-bit
data size when the AVX registers are specified instead of SSE registers, and
the size of memory operand is also doubled then.

................................................................................

</pre>
The promotion to new syntax according to the rules described above has been
applied to all the instructions from SSE extensions up to SSE4, with the
exceptions described below.

<div class="p"><!----></div>
<a 
id="vdppd21740"></a><tt>vdppd</tt> instruction has syntax extended to four operans, but it does not
have a 256-bit version.

<div class="p"><!----></div>
<a 
id="vsqrtpd21741"></a><a 
id="vsqrtps21742"></a><a 
id="vrcpps21743"></a><a 

id="vrsqrtps21744"></a>The are a few instructions, namely <tt>vsqrtpd</tt>, <tt>vsqrtps</tt>, <tt>vrcpps</tt> and
<tt>vrsqrtps</tt>, which can operate on 256-bit data size, but retained the syntax
with only two operands, because they use data from only one source:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vsqrtpd&nbsp;ymm1,ymm0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;put&nbsp;square&nbsp;roots&nbsp;into&nbsp;other&nbsp;register

</pre>

<div class="p"><!----></div>
<a 
id="vroundpd21745"></a><a 
id="vroundps21746"></a>In a similar way <tt>vroundpd</tt> and <tt>vroundps</tt> retained the syntax with three
operands, the last one being immediate value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vroundps&nbsp;ymm0,ymm1,0011b&nbsp;&nbsp;;&nbsp;round&nbsp;toward&nbsp;zero

</pre>

<div class="p"><!----></div>
<a 
id="vpcmpestri21747"></a><a 
id="vpcmpestrm21748"></a><a 
id="vpcmpistri21749"></a><a 
id="vpcmpistrm21750"></a><a 
id="vphminposuw21751"></a><a 
id="vpshufd21752"></a><a 
id="vpshufhw21753"></a><a 
id="vpshuflw21754"></a><a 
id="vcomiss21755"></a><a 
id="vcomisd21756"></a><a 
id="vcvtss2si21757"></a><a 
id="vcvtsd2si21758"></a><a 
id="vcvttss2si21759"></a><a 
id="vcvttsd2si21760"></a><a 
id="vextractps21761"></a><a 
id="vpextrb21762"></a><a 
id="vpextrw21763"></a><a 
id="vpextrd21764"></a><a 
id="vpextrq21765"></a><a 

id="vmovd21766"></a><a 
id="vmovq21767"></a><a 
id="vmovntdqa21768"></a><a 
id="vmaskmovdqu21769"></a><a 
id="vpmovmskb21770"></a><a 
id="vpmovsxbw21771"></a><a 
id="vpmovsxbd21772"></a><a 
id="vpmovsxbq21773"></a><a 
id="vpmovsxwd21774"></a><a 
id="vpmovsxwq21775"></a><a 
id="vpmovsxdq21776"></a><a 
id="vpmovzxbw21777"></a><a 
id="vpmovzxbd21778"></a><a 
id="vpmovzxbq21779"></a><a 
id="vpmovzxwd21780"></a><a 
id="vpmovzxwq21781"></a><a 

id="vpmovzxdq21782"></a>Also some of the operations on packed integers kept their two-operand or
three-operand syntax while being promoted to AVX version. In such case these
instructions follow exactly the same rules for operands as their SSE
counterparts (since operations on packed integers do not have 256-bit variants
in AVX extension). These include <tt>vpcmpestri</tt>, <tt>vpcmpestrm</tt>, <tt>vpcmpistri</tt>,
<tt>vpcmpistrm</tt>, <tt>vphminposuw</tt>, <tt>vpshufd</tt>, <tt>vpshufhw</tt>, <tt>vpshuflw</tt>. And there are
more instructions that in AVX versions keep exactly the same syntax for
operands as the one from SSE, without any additional options: <tt>vcomiss</tt>,
................................................................................
<tt>vcomisd</tt>, <tt>vcvtss2si</tt>, <tt>vcvtsd2si</tt>, <tt>vcvttss2si</tt>, <tt>vcvttsd2si</tt>, <tt>vextractps</tt>,
<tt>vpextrb</tt>, <tt>vpextrw</tt>, <tt>vpextrd</tt>, <tt>vpextrq</tt>, <tt>vmovd</tt>, <tt>vmovq</tt>, <tt>vmovntdqa</tt>,
<tt>vmaskmovdqu</tt>, <tt>vpmovmskb</tt>, <tt>vpmovsxbw</tt>, <tt>vpmovsxbd</tt>, <tt>vpmovsxbq</tt>, <tt>vpmovsxwd</tt>,
<tt>vpmovsxwq</tt>, <tt>vpmovsxdq</tt>, <tt>vpmovzxbw</tt>, <tt>vpmovzxbd</tt>, <tt>vpmovzxbq</tt>, <tt>vpmovzxwd</tt>,
<tt>vpmovzxwq</tt> and <tt>vpmovzxdq</tt>.

<div class="p"><!----></div>
<a 
id="vcvtdq2ps21783"></a><a 
id="vcvtps2dq21784"></a><a 
id="vcvttps2dq21785"></a><a 
id="vmovaps21786"></a><a 
id="vmovapd21787"></a><a 
id="vmovups21788"></a><a 
id="vmovupd21789"></a><a 
id="vmovdqa21790"></a><a 
id="vmovdqu21791"></a><a 
id="vlddqu21792"></a><a 
id="vmovntps21793"></a><a 
id="vmovntpd21794"></a><a 
id="vmovntdq21795"></a><a 
id="vmovsldup21796"></a><a 
id="vmovshdup21797"></a><a 
id="vmovmskps21798"></a><a 

id="vmovmskpd21799"></a>The move and conversion instructions have mostly been promoted to allow
256-bit size operands in addition to the 128-bit variant with syntax identical
to that from SSE version of the same instruction. Each of the
<tt>vcvtdq2ps</tt>, <tt>vcvtps2dq</tt> and <tt>vcvttps2dq</tt>,
<tt>vmovaps</tt>, <tt>vmovapd</tt>, <tt>vmovups</tt>, <tt>vmovupd</tt>,
<tt>vmovdqa</tt>, <tt>vmovdqu</tt>, <tt>vlddqu</tt>,
<tt>vmovntps</tt>, <tt>vmovntpd</tt>, <tt>vmovntdq</tt>,
<tt>vmovsldup</tt>, <tt>vmovshdup</tt>,
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vmovups&nbsp;[edi],ymm6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;store&nbsp;unaligned&nbsp;256-bit&nbsp;data

</pre>

<div class="p"><!----></div>
<a 
id="vmovddup21800"></a><tt>vmovddup</tt> has the identical 128-bit syntax as its SSE version, and it also
has a 256-bit version, which stores the duplicates of the lowest quad word
from the source operand in the lower half of destination operand, and in the
upper half of destination the duplicates of the low quad word from the upper
half of source. Both source and destination operands need then to be 256-bit
values.

<div class="p"><!----></div>
<a 
id="vmovlhps21801"></a><a 
id="vmovhlps21802"></a><tt>vmovlhps</tt> and <tt>vmovhlps</tt> have only 128-bit versions, and each takes three
operands, which all must be SSE registers. <tt>vmovlhps</tt> copies two single
precision values from the low quad word of second source register to the high
quad word of destination register, and copies the low quad word of first
source register into the low quad word of destination register. <tt>vmovhlps</tt>
copies two single  precision values from the high quad word of second source
register to the low quad word of destination register, and copies the high
quad word of first source register into the high quad word of destination
register.

<div class="p"><!----></div>
<a 
id="vmovlps21803"></a><a 
id="vmovhps21804"></a><a 
id="vmovlpd21805"></a><a 

id="vmovhpd21806"></a><tt>vmovlps</tt>, <tt>vmovhps</tt>, <tt>vmovlpd</tt> and <tt>vmovhpd</tt> have only 128-bit versions and
their syntax varies depending on whether memory operand is a destination or
source. When memory is destination, the syntax is identical to the one of
equivalent SSE instruction, and when memory is source, the instruction requires
three operands, first two being SSE registers and the third one 64-bit memory.
The value put into destination is then the value copied from first source with
either low or high quad word replaced with value from second source (the
memory operand).
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vmovhps&nbsp;[esi],xmm7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;store&nbsp;upper&nbsp;half&nbsp;to&nbsp;memory
&nbsp;&nbsp;&nbsp;&nbsp;vmovlps&nbsp;xmm0,xmm7,[ebx]&nbsp;&nbsp;;&nbsp;low&nbsp;from&nbsp;memory,&nbsp;rest&nbsp;from&nbsp;register

</pre>

<div class="p"><!----></div>
<a 
id="vmovss21807"></a><a 
id="vmovsd21808"></a><tt>vmovss</tt> and <tt>vmovsd</tt> have syntax identical to their SSE equivalents as long
as one of the operands is memory, while the versions that operate purely on
registers require three operands (each being SSE register). The value stored
in destination is then the value copied from first source with lowest data
element replaced with the lowest value from second source.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vmovss&nbsp;xmm3,[edi]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;low&nbsp;from&nbsp;memory,&nbsp;rest&nbsp;zeroed
&nbsp;&nbsp;&nbsp;&nbsp;vmovss&nbsp;xmm0,xmm1,xmm2&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;one&nbsp;value&nbsp;from&nbsp;xmm2,&nbsp;three&nbsp;from&nbsp;xmm1

</pre>

<div class="p"><!----></div>
<a 
id="vcvtss2sd21809"></a><a 
id="vcvtsd2ss21810"></a><a 
id="vcvtsi2ss21811"></a><a 

id="vcvtsi2d21812"></a><tt>vcvtss2sd</tt>, <tt>vcvtsd2ss</tt>, <tt>vcvtsi2ss</tt> and <tt>vcvtsi2d</tt> use the three-operand
syntax, where destination and first source are always SSE registers, and the
second source follows the same rules and the source in syntax of equivalent
SSE instruction. The value stored in destination is then the value copied from
first source with lowest data element replaced with the result of conversion.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vcvtsi2sd&nbsp;xmm4,xmm4,ecx&nbsp;&nbsp;;&nbsp;32-bit&nbsp;integer&nbsp;to&nbsp;64-bit&nbsp;float
&nbsp;&nbsp;&nbsp;&nbsp;vcvtsi2ss&nbsp;xmm0,xmm0,rax&nbsp;&nbsp;;&nbsp;64-bit&nbsp;integer&nbsp;to&nbsp;32-bit&nbsp;float

</pre>

<div class="p"><!----></div>
<a 
id="vcvtdq2pd21813"></a><a 
id="vcvtps2pd21814"></a><a 
id="vcvtpd2dq21815"></a><a 
id="vcvttpd2dq21816"></a><a 

id="vcvtpd2ps21817"></a><tt>vcvtdq2pd</tt> and <tt>vcvtps2pd</tt> allow the same syntax as their SSE equivalents,
plus the new variants with AVX register as destination and SSE register or
128-bit memory as source. Analogously <tt>vcvtpd2dq</tt>, <tt>vcvttpd2dq</tt> and
<tt>vcvtpd2ps</tt>, in addition to variant with syntax identical to SSE version,
allow a variant with SSE register as destination and AVX register or 256-bit
memory as source.

<div class="p"><!----></div>
<a 
id="vinsertps21818"></a><a 
id="vpinsrb21819"></a><a 
id="vpinsrw21820"></a><a 
id="vpinsrd21821"></a><a 
id="vpinsrq21822"></a><a 

id="vpblendw21823"></a><tt>vinsertps</tt>, <tt>vpinsrb</tt>, <tt>vpinsrw</tt>, <tt>vpinsrd</tt>, <tt>vpinsrq</tt> and <tt>vpblendw</tt> use
a syntax with four operands, where destination and first source have to be SSE
registers, and the third and fourth operand follow the same rules as second
and third operand in the syntax of equivalent SSE instruction. Value stored in
destination is the the value copied from first source with some data elements
replaced with values extracted from the second source, analogously to the
operation of corresponding SSE instruction.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpinsrd&nbsp;xmm0,xmm0,eax,3&nbsp;&nbsp;;&nbsp;insert&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a 
id="vblendvps21824"></a><a 
id="vblendvpd21825"></a><a 

id="vpblendvb21826"></a><tt>vblendvps</tt>, <tt>vblendvpd</tt> and <tt>vpblendvb</tt> use a new syntax with four register
operands: destination, two sources and a mask, where second source can also be
a memory operand. <tt>vblendvps</tt> and <tt>vblendvpd</tt> have 256-bit variant, where
operands are AVX registers or 256-bit memory, as well as 128-bit variant,
which has operands being SSE registers or 128-bit memory. <tt>vpblendvb</tt> has only
a 128-bit variant. Value stored in destination is the value copied from the
first source with some data elements replaced, according to mask, by values
from the second source.
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vblendvps&nbsp;ymm3,ymm1,ymm2,ymm7&nbsp;&nbsp;;&nbsp;blend&nbsp;according&nbsp;to&nbsp;mask

</pre>

<div class="p"><!----></div>
<a 
id="vptest21827"></a><a 
id="vtestps21828"></a><a 

id="vtestpd21829"></a><tt>vptest</tt> allows the same syntax as its SSE version and also has a 256-bit
version, with both operands doubled in size. There are also two new
instructions, <tt>vtestps</tt> and <tt>vtestpd</tt>, which perform analogous tests, but only
of the sign bits of corresponding single precision or double precision values,
and set the ZF and CF accordingly. They follow the same syntax rules as
<tt>vptest</tt>.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vptest&nbsp;ymm0,yword&nbsp;[ebx]&nbsp;&nbsp;;&nbsp;test&nbsp;256-bit&nbsp;values
&nbsp;&nbsp;&nbsp;&nbsp;vtestpd&nbsp;xmm0,xmm1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;test&nbsp;sign&nbsp;bits&nbsp;of&nbsp;64-bit&nbsp;floats

</pre>

<div class="p"><!----></div>
<a 
id="vbroadcastss21830"></a><a 
id="vbroadcastsd21831"></a><a 
id="vbroadcastf12821832"></a><tt>vbroadcastss</tt>, <tt>vbroadcastsd</tt> and <tt>vbroadcastf128</tt> are new instructions,
which broadcast the data element defined by source operand into all elements
of corresponing size in the destination register. <tt>vbroadcastss</tt> needs
source to be 32-bit memory and destination to be either SSE or AVX register.
<tt>vbroadcastsd</tt> requires 64-bit memory as source, and AVX register as
destination. <tt>vbroadcastf128</tt> requires 128-bit memory as source, and AVX
register as destination.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vbroadcastss&nbsp;ymm0,dword&nbsp;[eax]&nbsp;&nbsp;;&nbsp;get&nbsp;eight&nbsp;copies&nbsp;of&nbsp;value

</pre>

<div class="p"><!----></div>
<a 
id="vinsertf12821833"></a><tt>vinsertf128</tt> is the new instruction, which takes four operands. The
destination and first source have to be AVX registers, second source can be
SSE register or 128-bit memory location, and fourth operand should be an
immediate value. It stores in destination the value obtained by taking
contents of first source and replacing one of its 128-bit units with value of
the second source. The lowest bit of fourth operand specifies at which
position that replacement is done (either 0 or 1).

<div class="p"><!----></div>
<a 
id="vextractf12821834"></a><tt>vextractf128</tt> is the new instruction with three operands. The destination
needs to be SSE register or 128-bit memory location, the source must be AVX
register, and the third operand should be an immediate value. It extracts
into destination one of the 128-bit units from source. The lowest bit of third
operand specifies, which unit is extracted.

<div class="p"><!----></div>
<a 
id="vmaskmovps21835"></a><a 
id="vmaskmovpd21836"></a><tt>vmaskmovps</tt> and <tt>vmaskmovpd</tt> are the new instructions with three operands
that selectively store in destination the elements from second source
depending on the sign bits of corresponding elements from first source. These
instructions can operate on either 128-bit data (SSE registers) or 256-bit
data (AVX registers). Either destination or second source has to be a memory
location of appropriate size, the two other operands should be registers.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vmaskmovps&nbsp;[edi],xmm0,xmm5&nbsp;&nbsp;;&nbsp;conditionally&nbsp;store
&nbsp;&nbsp;&nbsp;&nbsp;vmaskmovpd&nbsp;ymm5,ymm0,[esi]&nbsp;&nbsp;;&nbsp;conditionally&nbsp;load

</pre>

<div class="p"><!----></div>
<a 
id="vpermilpd21837"></a><a 
id="vpermilps21838"></a><tt>vpermilpd</tt> and <tt>vpermilps</tt> are the new instructions with three operands
that permute the values from first source according to the control fields from
second source and put the result into destination operand. It allows to use
either three SSE registers or three AVX registers as its operands, the second
source can be a memory of size equal to the registers used. In alternative
form the second source can be immediate value and then the first source
can be a memory location of the size equal to destination register.

<div class="p"><!----></div>
<a 
id="vperm2f12821839"></a><tt>vperm2f128</tt> is the new instruction with four operands, which selects
128-bit blocks of floating point data from first and second source according
to the bit fields from fourth operand, and stores them in destination.
Destination and first source need to be AVX registers, second source can be
AVX register or 256-bit memory area, and fourth operand should be an immediate
value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vperm2f128&nbsp;ymm0,ymm6,ymm7,12h&nbsp;&nbsp;;&nbsp;permute&nbsp;128-bit&nbsp;blocks

</pre>

<div class="p"><!----></div>
<a 
id="vzeroall21840"></a><a 
id="vzeroupper21841"></a><tt>vzeroall</tt> instruction sets all the AVX registers to zero. <tt>vzeroupper</tt> sets
the upper 128-bit portions of all AVX registers to zero, leaving the SSE
registers intact. These new instructions take no operands.

<div class="p"><!----></div>
<a 
id="vldmxcsr21842"></a><a 
id="vstmxcsr21843"></a><tt>vldmxcsr</tt> and <tt>vstmxcsr</tt> are the AVX versions of <tt>ldmxcsr</tt> and <tt>stmxcsr</tt>
instructions. The rules for their operands remain unchanged.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.22"></a><h3>
2.1.22&nbsp;&nbsp;AVX2 instructions</h3>

<div class="p"><!----></div>
................................................................................
<div class="p"><!----></div>
The AVX instructions that operate on packed integers and had only a 128-bit
variants, have been supplemented with 256-bit variants, and thus their syntax
rules became analogous to AVX instructions operating on packed floating point
types.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpsubb&nbsp;ymm0,ymm0,[esi]&nbsp;&nbsp;&nbsp;;&nbsp;substract&nbsp;32&nbsp;packed&nbsp;bytes
&nbsp;&nbsp;&nbsp;&nbsp;vpavgw&nbsp;ymm3,ymm0,ymm2&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;average&nbsp;of&nbsp;16-bit&nbsp;integers

</pre>
However there are some instructions that have not been equipped with the
256-bit variants. <tt>vpcmpestri</tt>, <tt>vpcmpestrm</tt>, <tt>vpcmpistri</tt>, <tt>vpcmpistrm</tt>,
<tt>vpextrb</tt>, <tt>vpextrw</tt>, <tt>vpextrd</tt>, <tt>vpextrq</tt>, <tt>vpinsrb</tt>, <tt>vpinsrw</tt>, <tt>vpinsrd</tt>,
<tt>vpinsrq</tt> and <tt>vphminposuw</tt> are not affected by AVX2 and allow only the
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpsllw&nbsp;ymm2,ymm2,xmm4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;shift&nbsp;words&nbsp;left
&nbsp;&nbsp;&nbsp;&nbsp;vpsrad&nbsp;ymm0,ymm3,xword&nbsp;[ebx]&nbsp;;&nbsp;shift&nbsp;double&nbsp;words&nbsp;right

</pre>

<div class="p"><!----></div>
<a 
id="vpsllvd21844"></a><a 
id="vpsllvq21845"></a><a 
id="vpsrlvd21846"></a><a 
id="vpsrlvq21847"></a><a 

id="vpsravd21848"></a>There are also new packed shift instructions with standard three-operand AVX
syntax, which shift each element from first source by the amount specified in
corresponding element of second source, and store the results in destination.
<tt>vpsllvd</tt> shifts 32-bit elements left, <tt>vpsllvq</tt> shifts 64-bit elements left,
<tt>vpsrlvd</tt> shifts 32-bit elements right logically, <tt>vpsrlvq</tt> shifts 64-bit
elements right logically and <tt>vpsravd</tt> shifts 32-bit elements right
arithmetically.

................................................................................

<div class="p"><!----></div>
Also <tt>vmovntdqa</tt> has been upgraded with 256-bit variant, so it allows to
transfer 256-bit value from memory to AVX register, it needs memory address
to be aligned to 32 bytes.

<div class="p"><!----></div>
<a 
id="vpmaskmovd21849"></a><a 
id="vpmaskmovq21850"></a><tt>vpmaskmovd</tt> and <tt>vpmaskmovq</tt> are the new instructions with syntax identical
to <tt>vmaskmovps</tt> or <tt>vmaskmovpd</tt>, and they performs analogous operation on
packed 32-bit or 64-bit values.

<div class="p"><!----></div>
<a 
id="vinserti12821851"></a><a 
id="vextracti12821852"></a><a 
id="vbroadcasti12821853"></a><a 

id="vperm2i12821854"></a><tt>vinserti128</tt>, <tt>vextracti128</tt>, <tt>vbroadcasti128</tt> and <tt>vperm2i128</tt> are the new
instructions with syntax identical to <tt>vinsertf128</tt>, <tt>vextractf128</tt>,
<tt>vbroadcastf128</tt> and <tt>vperm2f128</tt> respectively, and they perform analogous
operations on 128-bit blocks of integer data.

<div class="p"><!----></div>
<tt>vbroadcastss</tt> and <tt>vbroadcastsd</tt> instructions have been extended to allow
SSE register as a source operand (which in AVX could only be a memory).

<div class="p"><!----></div>
<a 
id="vpbroadcastb21855"></a><a 
id="vpbroadcastw21856"></a><a 
id="vpbroadcastd21857"></a><a 
id="vpbroadcastq21858"></a><tt>vpbroadcastb</tt>, <tt>vpbroadcastw</tt>, <tt>vpbroadcastd</tt> and <tt>vpbroadcastq</tt> are the
new instructions which broadcast the byte, word, double word or quad word from
the source operand into all elements of corresponing size in the destination
register. The destination operand can be either SSE or AVX register, and the
source operand can be SSE register or memory of size equal to the size of data
element.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpbroadcastb&nbsp;ymm0,byte&nbsp;[ebx]&nbsp;&nbsp;;&nbsp;get&nbsp;32&nbsp;identical&nbsp;bytes

</pre>

<div class="p"><!----></div>
<a 
id="vpermd21859"></a><a 
id="vpermps21860"></a><tt>vpermd</tt> and <tt>vpermps</tt> are new three-operand instructions, which use each
32-bit element from first source as an index of element in second source which
is copied into destination at position corresponding to element containing
index. The destination and first source have to be AVX registers, and the
second source can be AVX register or 256-bit memory.

<div class="p"><!----></div>
<a 
id="vpermq21861"></a><a 
id="vpermpd21862"></a><tt>vpermq</tt> and <tt>vpermpd</tt> are new three-operand instructions, which use 2-bit
indexes from the immediate value specified as third operand to determine which
element from source store at given position in destination. The destination
has to be AVX register, source can be AVX register or 256-bit memory, and the
third operand must be 8-bit immediate value.

<div class="p"><!----></div>
The family of new instructions performing <tt>gather</tt> operation have special
................................................................................
destination and mask registers, the higher elements of destination are zeroed.
After the value is successfuly loaded, the corresponding element in mask
register is set to zero. The destination, index and mask should all be
distinct registers, it is not allowed to use the same register in two
different roles.

<div class="p"><!----></div>
<a 
id="vgatherdps21863"></a><tt>vgatherdps</tt> loads single precision floating point values addressed by
32-bit indexes. The destination, index and mask should all be registers of the
same type, either SSE or AVX. The data addressed by memory operand is 32-bit
in size.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vgatherdps&nbsp;xmm0,[eax+xmm1],xmm3&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;gather&nbsp;four&nbsp;floats
&nbsp;&nbsp;&nbsp;&nbsp;vgatherdps&nbsp;ymm0,[ebx+ymm7*4],ymm3&nbsp;&nbsp;;&nbsp;gather&nbsp;eight&nbsp;floats

</pre>

<div class="p"><!----></div>
<a 
id="vgatherqps21864"></a><tt>vgatherqps</tt> loads single precision floating point values addressed by
64-bit indexes. The destination and mask should always be SSE registers, while
index register can be either SSE or AVX register. The data addressed by memory
operand is 32-bit in size.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vgatherqps&nbsp;xmm0,[xmm2],xmm3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;gather&nbsp;two&nbsp;floats
&nbsp;&nbsp;&nbsp;&nbsp;vgatherqps&nbsp;xmm0,[ymm2+64],xmm3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;gather&nbsp;four&nbsp;floats

</pre>

<div class="p"><!----></div>
<a 
id="vgatherdpd21865"></a><tt>vgatherdpd</tt> loads double precision floating point values addressed by
32-bit indexes. The index register should always be SSE register, the
destination and mask should be two registers of the same type, either SSE or
AVX. The data addressed by memory operand is 64-bit in size.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vgatherdpd&nbsp;xmm0,[ebp+xmm1],xmm3&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;gather&nbsp;two&nbsp;doubles
&nbsp;&nbsp;&nbsp;&nbsp;vgatherdpd&nbsp;ymm0,[xmm3*8],ymm5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;gather&nbsp;four&nbsp;doubles

</pre>

<div class="p"><!----></div>
<a 
id="vgatherqpd21866"></a><tt>vgatherqpd</tt> loads double precision floating point values addressed by
64-bit indexes. The destination, index and mask should all be registers of the
same type, either SSE or AVX. The data addressed by memory operand is 64-bit
in size.

<div class="p"><!----></div>
<a 
id="vpgatherdd21867"></a><a 
id="vpgatherqd21868"></a><tt>vpgatherdd</tt> and <tt>vpgatherqd</tt> load 32-bit values addressed by either 32-bit
or 64-bit indexes. They follow the same rules as <tt>vgatherdps</tt> and <tt>vgatherqps</tt>
respectively.

<div class="p"><!----></div>
<a 
id="vpgatherdq21869"></a><a 
id="vpgatherqq21870"></a><tt>vpgatherdq</tt> and <tt>vpgatherqq</tt> load 64-bit values addressed by either 32-bit
or 64-bit indexes. They follow the same rules as <tt>vgatherdpd</tt> and <tt>vgatherqpd</tt>
respectively.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.23"></a><h3>
2.1.23&nbsp;&nbsp;Auxiliary sets of computational instructions</h3>

................................................................................
The AES extension provides a specialized set of instructions for the
purpose of cryptographic computations defined by Advanced Encryption Standard.
Each of these instructions has two versions: the AVX one and the one with
SSE-like syntax that uses classic encoding. Refer to the Intel manuals for the
details of operation of these instructions.

<div class="p"><!----></div>
<a 
id="aesenc21871"></a><a 
id="aesenclast21872"></a><a 
id="vaesenc21873"></a><a 

id="vaesenclast21874"></a><tt>aesenc</tt> and <tt>aesenclast</tt> perform a single round of AES encryption on data
from first source with a round key from second source, and store result in
destination. The destination and first source are SSE registers, and the
second source can be SSE register or 128-bit memory. The AVX versions of these
instructions, <tt>vaesenc</tt> and <tt>vaesenclast</tt>, use the syntax with three operands,
while the SSE-like version has only two operands, with first operand being
both the destination and first source.

<div class="p"><!----></div>
<a 
id="aesdec21875"></a><a 
id="aesdeclast21876"></a><tt>aesdec</tt> and <tt>aesdeclast</tt> perform a single round of AES decryption on data
from first source with a round key from second source. The syntax rules for
them and their AVX versions are the same as for <tt>aesenc</tt>.

<div class="p"><!----></div>
<a 
id="aesimc21877"></a><a 
id="vaesimc21878"></a><tt>aesimc</tt> performs the InvMixColumns transformation of source operand and
store the result in destination. Both <tt>aesimc</tt> and <tt>vaesimc</tt> use only two
operands, destination being SSE register, and source being SSE register or
128-bit memory location.

<div class="p"><!----></div>
<a 
id="aeskeygenassist21879"></a><tt>aeskeygenassist</tt> is a helper instruction for generating the round key.
It needs three operands: destination being SSE register, source being SSE
register or 128-bit memory, and third operand being 8-bit immediate value.
The AVX version of this instruction uses the same syntax.

<div class="p"><!----></div>
<a 
id="pclmulqdq21880"></a><a 
id="vpclmulqdq21881"></a>The CLMUL extension introduces just one instruction, <tt>pclmulqdq</tt>, and its
AVX version as well. This instruction performs a carryless multiplication of
two 64-bit values selected from first and second source according to the bit
fields in immediate value. The destination and first source are SSE registers,
second source is SSE register or 128-bit memory, and immediate value is
provided as last operand. <tt>vpclmulqdq</tt> takes four operands, while <tt>pclmulqdq</tt>
takes only three operands, with the first one serving both the role of
destination and first source.
................................................................................
The FMA (Fused Multiply-Add) extension introduces additional AVX
instructions which perform multiplication and summation as single operation.
Each one takes three operands, first one serving both the role of destination
and first source, and the following ones being the second and third source.
The mnemonic of FMA instruction is obtained by appending to <tt>vf</tt> prefix: first
either <tt>m</tt> or <tt>nm</tt> to select whether result of multiplication should be taken
as-is or negated, then either <tt>add</tt> or <tt>sub</tt> to select whether third value
will be added to the product or substracted from the product, then either
<tt>132</tt>, <tt>213</tt> or <tt>231</tt> to select which source operands are multiplied and which
one is added or substracted, and finally the type of data on which the
instruction operates, either <tt>ps</tt>, <tt>pd</tt>, <tt>ss</tt> or <tt>sd</tt>. As it was with SSE
instructions promoted to AVX, instructions operating on packed floating point
values allow 128-bit or 256-bit syntax, in former all the operands are SSE
registers, but the third one can also be a 128-bit memory, in latter the
operands are AVX registers and the third one can also be a 256-bit memory.
Instructions that compute just one floating point result need operands to be
SSE registers, and the third operand can also be a memory, either 32-bit for
single precision or 64-bit for double precision.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vfmsub231ps&nbsp;ymm1,ymm2,ymm3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;multiply&nbsp;and&nbsp;substract
&nbsp;&nbsp;&nbsp;&nbsp;vfnmadd132sd&nbsp;xmm0,xmm5,[ebx]&nbsp;&nbsp;&nbsp;;&nbsp;multiply,&nbsp;negate&nbsp;and&nbsp;add

</pre>
In addition to the instructions created by the rule described above, there are
families of instructions with mnemonics starting with either <tt>vfmaddsub</tt> or
<tt>vfmsubadd</tt>, followed by either <tt>132</tt>, <tt>213</tt> or <tt>231</tt> and then either <tt>ps</tt> or
<tt>pd</tt> (the operation must always be on packed values in this case). They add
to the result of multiplication or substract from it depending on the position
of value in packed data - instructions from the <tt>vfmaddsub</tt> group add when the
position is odd and substract when the position is even, instructions from the
<tt>vfmsubadd</tt> group add when the position is even and subtstract when the
position is odd. The rules for operands are the same as for other FMA
instructions.

<div class="p"><!----></div>
The FMA4 instructions are similar to FMA, but use syntax with four operands
and thus allow destination to be different than all the sources. Their
mnemonics are identical to FMA instructions with the <tt>132</tt>, <tt>213</tt> or <tt>231</tt> cut
out, as having separate destination operand makes such selection of operands
superfluous. The multiplication is always performed on values from the first
and second source, and then the value from third source is added or
substracted. Either second or third source can be a memory operand, and the
rules for the sizes of operands are the same as for FMA instructions.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vfmaddpd&nbsp;ymm0,ymm1,[esi],ymm2&nbsp;&nbsp;;&nbsp;multiply&nbsp;and&nbsp;add
&nbsp;&nbsp;&nbsp;&nbsp;vfmsubss&nbsp;xmm0,xmm1,xmm2,[ebx]&nbsp;&nbsp;;&nbsp;multiply&nbsp;and&nbsp;substract

</pre>

<div class="p"><!----></div>
<a 
id="vcvtps2ph21882"></a><a 
id="vcvtph2ps21883"></a>The F16C extension consists of two instructions, <tt>vcvtps2ph</tt> and
<tt>vcvtph2ps</tt>, which convert floating point values between single precision and
half precision (the 16-bit floating point format). <tt>vcvtps2ph</tt> takes three
operands: destination, source, and rounding controls. The third operand is
always an immediate, the source is either SSE or AVX register containing
single precision values, and the destination is SSE register or memory, the
size of memory is 64 bits when the source is SSE register and 128 bits when
the source is AVX register. <tt>vcvtph2ps</tt> takes two operands, the destination
that can be SSE or AVX register, and the source that is SSE register or memory
with size of the half of destination operand's size.

<div class="p"><!----></div>
<a 
id="vfrczps21884"></a><a 
id="vfrczss21885"></a><a 
id="vfrczpd21886"></a><a 

id="vfrczsd21887"></a>The AMD XOP extension introduces a number of new vector instructions with
encoding and syntax analogous to AVX instructions. <tt>vfrczps</tt>, <tt>vfrczss</tt>,
<tt>vfrczpd</tt> and <tt>vfrczsd</tt> extract fractional portions of single or double
precision values, they all take two operands. The packed operations allow
either SSE or AVX register as destination, for the other two it has to be SSE
register. Source can be register of the same type as destination, or memory
of appropriate size (256-bit for destination being AVX register, 128-bit for
packed operation with destination being SSE register, 64-bit for operation
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vfrczps&nbsp;ymm0,[esi]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;fractional&nbsp;parts

</pre>

<div class="p"><!----></div>
<a 
id="vpcmov21888"></a><tt>vpcmov</tt> copies bits from either first or second source into destination
depending on the values of corresponding bits in the fourth operand (the
selector). If the bit in selector is set, the corresponding bit from first
source is copied into the same position in destination, otherwise the bit from
second source is copied. Either second source or selector can be memory
location, 128-bit or 256-bit depending on whether SSE registers or AVX
registers are specified as the other operands.

................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb2.5">
</a> 
<div style="text-align:center">
<table border="1">
<tr><td align="center">Code </td><td align="center">Mnemonic </td><td align="center">Description </td></tr><tr><td></td></tr>
<tr><td align="center">0 </td><td align="center"><tt>lt</tt> </td><td align="center">less than </td></tr>
<tr><td align="center">1 </td><td align="center"><tt>le</tt> </td><td align="center">less than or equal </td></tr>
<tr><td align="center">2 </td><td align="center"><tt>gt</tt> </td><td align="center">greater than </td></tr>
<tr><td align="center">3 </td><td align="center"><tt>ge</tt> </td><td align="center">greater than or equal </td></tr>
<tr><td align="center">4 </td><td align="center"><tt>eq</tt> </td><td align="center">equal </td></tr>
<tr><td align="center">5 </td><td align="center"><tt>neq</tt> </td><td align="center">not equal </td></tr>
................................................................................
</div>

<div style="text-align:center">Table 2.5: XOP comparisons.</div>
<a id="tab:XOP_comparisons">
</a>

<div class="p"><!----></div>
<a 
id="vpermil2ps21889"></a><a 
id="vpermil2pd21890"></a><tt>vpermil2ps</tt> and <tt>vpermil2pd</tt> set the elements in destination register to
zero or to a value selected from first or second source depending on the
corresponding bit fields from the fourth operand (the selector) and the
immediate value provided in fifth operand. Refer to the AMD manuals for the
detailed explanation of the operation performed by these instructions. Each
of the first four operands can be a register, and either second source or
selector can be memory location, 128-bit or 256-bit depending on whether SSE
registers or AVX registers are used for the other operands.
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpermil2ps&nbsp;ymm0,ymm3,ymm7,ymm2,0&nbsp;&nbsp;;&nbsp;permute&nbsp;from&nbsp;two&nbsp;sources

</pre>

<div class="p"><!----></div>
<a 
id="vphaddbw21891"></a><a 
id="vphaddubw21892"></a><a 
id="vphaddbd21893"></a><a 
id="vphaddubd21894"></a><a 
id="vphaddbq21895"></a><a 
id="vphaddubq21896"></a><a 
id="vphaddwd21897"></a><a 
id="vphadduwd21898"></a><a 
id="vphaddwq21899"></a><a 
id="vphadduwq21900"></a><a 
id="vphadddq21901"></a><a 
id="vphaddudq21902"></a><a 
id="vphsubbw21903"></a><a 
id="vphsubwd21904"></a><a 
id="vphsubdq21905"></a><a 

id="21906"></a><tt>vphaddbw</tt> adds pairs of adjacent signed bytes to form 16-bit values and
stores them at the same positions in destination. <tt>vphaddubw</tt> does the same
but treats the bytes as unsigned. <tt>vphaddbd</tt> and <tt>vphaddubd</tt> sum all bytes
(either signed or unsigned) in each four-byte block to 32-bit results,
<tt>vphaddbq</tt> and <tt>vphaddubq</tt> sum all bytes in each eight-byte block to
64-bit results, <tt>vphaddwd</tt> and <tt>vphadduwd</tt> add pairs of words to 32-bit
results, <tt>vphaddwq</tt> and <tt>vphadduwq</tt> sum all words in each four-word block to
64-bit results, <tt>vphadddq</tt> and <tt>vphaddudq</tt> add pairs of double words to 64-bit
results. <tt>vphsubbw</tt> substracts in each two-byte block the byte at higher
position from the one at lower position, and stores the result as a signed
16-bit value at the corresponding position in destination, <tt>vphsubwd</tt>
substracts in each two-word block the word at higher position from the one at
lower position and makes signed 32-bit results, <tt>vphsubdq</tt> substract in each
block of two double word the one at higher position from the one at lower
position and makes signed 64-bit results. Each of these instructions takes
two operands, the destination being SSE register, and the source being SSE
register or 128-bit memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vphadduwq&nbsp;xmm0,xmm1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;sum&nbsp;quadruplets&nbsp;of&nbsp;words

</pre>

<div class="p"><!----></div>
<a 
id="vpmacsww21907"></a><a 
id="vpmacssww21908"></a><a 
id="vpmacsdd21909"></a><a 
id="vpmacssdd21910"></a><a 
id="vpmacswd21911"></a><a 
id="vpmacsswd21912"></a><a 
id="vpmacsdql21913"></a><a 
id="vpmacssdql21914"></a><a 
id="vpmacsdqh21915"></a><a 
id="vpmacssdqh21916"></a><a 
id="vpmadcswd21917"></a><a 
id="vpmadcsswd21918"></a><a 

id="21919"></a><a 
id="21920"></a><tt>vpmacsww</tt> and <tt>vpmacssww</tt> multiply the corresponding signed 16-bit values
from the first and second source and then add the products to the parallel
values from the third source, then <tt>vpmacsww</tt> takes the lowest 16 bits of the
result and <tt>vpmacssww</tt> saturates the result down to 16-bit value, and they
store the final 16-bit results in the destination. <tt>vpmacsdd</tt> and <tt>vpmacssdd</tt>
perform the analogous operation on 32-bit values. <tt>vpmacswd</tt> and <tt>vpmacsswd</tt> do
the same calculation only on the low 16-bit values from each 32-bit block and
form the 32-bit results. <tt>vpmacsdql</tt> and <tt>vpmacssdql</tt> perform such operation
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpmacsdd&nbsp;xmm6,xmm1,[ebx],xmm6&nbsp;&nbsp;;&nbsp;accumulate&nbsp;product

</pre>

<div class="p"><!----></div>
<a 
id="vpperm21921"></a><tt>vpperm</tt> selects bytes from first and second source, optionally applies a
separate transformation to each of them, and stores them in the destination.
The bit fields in fourth operand (the selector) specify for each position in
destination what byte from which source is taken and what operation is applied
to it before it is stored there. Refer to the AMD manuals for the detailed
information about these bit fields. This instruction takes four operands,
either second source or selector can be a 128-bit memory (or they can be SSE
registers both), all the other operands have to be SSE registers.

<div class="p"><!----></div>
<a 
id="vpshlb21922"></a><a 
id="vpshlw21923"></a><a 
id="vpshld21924"></a><a 
id="vpshlq21925"></a><tt>vpshlb</tt>, <tt>vpshlw</tt>, <tt>vpshld</tt> and <tt>vpshlq</tt> shift logically bytes, words, double
words or quad words respectively. The amount of bits to shift by is specified
for each element separately by the signed byte placed at the corresponding
position in the third operand. The source containing elements to shift is
provided as second operand. Either second or third operand can be 128-bit
memory (or they can be SSE registers both) and the other operands have to be
SSE registers.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpshld&nbsp;xmm3,xmm1,[ebx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;shift&nbsp;bytes&nbsp;from&nbsp;xmm1

</pre>

<div class="p"><!----></div>
<a 
id="vpshab21926"></a><a 
id="vpshaw21927"></a><a 
id="vpshad21928"></a><a 
id="vpshaq21929"></a><a 
id="vprotb21930"></a><a 
id="vprotw21931"></a><a 
id="vprotd21932"></a><a 

id="vprotq21933"></a><tt>vpshab</tt>, <tt>vpshaw</tt>, <tt>vpshad</tt> and <tt>vpshaq</tt> arithmetically shift bytes, words,
double words or quad words. These instructions follow the same rules as the
logical shifts described above. <tt>vprotb</tt>, <tt>vprotw</tt>, <tt>vprotd</tt> and <tt>vprotq</tt>
rotate bytes, word, double words or quad words. They follow the same rules as
shifts, but additionally allow third operand to be immediate value, in which
case the same amount of rotation is specified for all the elements in source.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vprotb&nbsp;xmm0,[esi],3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;rotate&nbsp;bytes&nbsp;to&nbsp;the&nbsp;left

</pre>

<div class="p"><!----></div>
<a 
id="movbe21934"></a>The MOVBE extension introduces just one new instruction, <tt>movbe</tt>, which
swaps bytes in value from source before storing it in destination, so can
be used to load and store big endian values. It takes two operands, either
the destination or source should be a 16-bit, 32-bit or 64-bit memory (the
last one being only allowed in long mode), and the other operand should be
a general register of the same size.

<div class="p"><!----></div>
................................................................................
The BMI extension, consisting of two subsets - BMI1 and BMI2, introduces
new instructions operating on general registers, which use the same encoding
as AVX instructions and so allow the extended syntax. All these instructions
use 32-bit operands, and in long mode they also allow the forms with 64-bit
operands.

<div class="p"><!----></div>
<a 
id="andn21935"></a><tt>andn</tt> calculates the bitwise AND of second source with the inverted bits
of first source and stores the result in destination. The destination and
the first source have to be general registers, the second source can be
general register or memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;andn&nbsp;edx,eax,[ebx]&nbsp;&nbsp;&nbsp;;&nbsp;bit-multiply&nbsp;inverted&nbsp;eax&nbsp;with&nbsp;memory

</pre>

<div class="p"><!----></div>
<a 
id="bextr21936"></a><tt>bextr</tt> extracts from the first source the sequence of bits using an index
and length specified by bit fields in the second source operand and stores
it into destination. The lowest 8 bits of second source specify the position
of bit sequence to extract and the next 8 bits of second source specify the
length of sequence. The first source can be a general register or memory,
the other two operands have to be general registers.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;bextr&nbsp;eax,[esi],ecx&nbsp;&nbsp;;&nbsp;extract&nbsp;bit&nbsp;field&nbsp;from&nbsp;memory

</pre>

<div class="p"><!----></div>
<a 
id="blsi21937"></a><tt>blsi</tt> extracts the lowest set bit from the source, setting all the other
bits in destination to zero. The destination must be a general register,
the source can be general register or memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;blsi&nbsp;rax,r11&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;isolate&nbsp;the&nbsp;lowest&nbsp;set&nbsp;bit

</pre>

<div class="p"><!----></div>
<a 
id="blsmsk21938"></a><a 
id="blsr21939"></a><tt>blsmsk</tt> sets all the bits in the destination up to the lowest set bit in
the source, including this bit. <tt>blsr</tt> copies all the bits from the source to
destination except for the lowest set bit, which is replaced by zero. These
instructions follow the same rules for operands as <tt>blsi</tt>.

<div class="p"><!----></div>
<a 
id="tzcnt21940"></a><a 
id="lzcnt21941"></a><tt>tzcnt</tt> counts the number of trailing zero bits, that is the zero bits up to
the lowest set bit of source value. This instruction is analogous to <tt>lzcnt</tt>
and follows the same rules for operands, so it also has a 16-bit version,
unlike the other BMI instructions.

<div class="p"><!----></div>
<a 
id="bzhi21942"></a><tt>bzhi</tt> is BMI2 instruction, which copies the bits from first source to
destination, zeroing all the bits up from the position specified by second
source. It follows the same rules for operands as <tt>bextr</tt>.

<div class="p"><!----></div>
<a 
id="pext21943"></a><a 
id="pdep21944"></a><tt>pext</tt> uses a mask in second source operand to select bits from first
operands and puts the selected bits as a continuous sequence into destination.
<tt>pdep</tt> performs the reverse operation - it takes sequence of bits from the
first source and puts them consecutively at the positions where the bits in
second source are set, setting all the other bits in destination to zero.
These BMI2 instructions follow the same rules for operands as <tt>andn</tt>.

<div class="p"><!----></div>
<a 
id="mulx21945"></a><tt>mulx</tt> is a BMI2 instruction which performs an unsigned multiplication of
value from EDX or RDX register (depending on the size of specified operands)
by the value from third operand, and stores the low half of result in the
second operand, and the high half of result in the first operand, and it does
it without affecting the flags. The third operand can be general register or
memory, and both the destination operands have to be general registers.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;mulx&nbsp;edx,eax,ecx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;multiply&nbsp;edx&nbsp;by&nbsp;ecx&nbsp;into&nbsp;edx:eax

</pre>

<div class="p"><!----></div>
<a 
id="shlx21946"></a><a 
id="shrx21947"></a><a 
id="sarx21948"></a><tt>shlx</tt>, <tt>shrx</tt> and <tt>sarx</tt> are BMI2 instructions, which perform logical or
arithmetical shifts of value from first source by the amount specified by
second source, and store the result in destination without affecting the
flags. The have the same rules for operands as <tt>bzhi</tt> instruction.

<div class="p"><!----></div>
<a 
id="rorx21949"></a><tt>rorx</tt> is a BMI2 instruction which rotates right the value from source
operand by the constant amount specified in third operand and stores the
result in destination without affecting the flags. The destination operand
has to be general register, the source operand can be general register or
memory, and the third operand has to be an immediate value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;rorx&nbsp;eax,edx,7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;rotate&nbsp;without&nbsp;affecting&nbsp;flags

</pre>

<div class="p"><!----></div>
<a 
id="blsic21950"></a><a 
id="blsfill21951"></a>The TBM is an extension designed by AMD to supplement the BMI set. The
<tt>bextr</tt> instruction is extended with a new form, in which second source is
a 32-bit immediate value. <tt>blsic</tt> is a new instruction which performs the
same operation as <tt>blsi</tt>, but with the bits of result reversed. It uses the
same rules for operands as <tt>blsi</tt>. <tt>blsfill</tt> is a new instruction, which takes
the value from source, sets all the bits below the lowest set bit and store
the result in destination, it also uses the same rules for operands as <tt>blsi</tt>.

<div class="p"><!----></div>
<a 
id="blci21952"></a><a 
id="blcic21953"></a><a 
id="blcs21954"></a><a 
id="blcmsk21955"></a><a 
id="blcfill21956"></a><tt>blci</tt>, <tt>blcic</tt>, <tt>blcs</tt>, <tt>blcmsk</tt> and <tt>blcfill</tt> are instructions analogous
to <tt>blsi</tt>, <tt>blsic</tt>, <tt>blsr</tt>, <tt>blsmsk</tt> and <tt>blsfill</tt> respectively, but they
perform the bit-inverted versions of the same operations. They follow the
same rules for operands as the instructions they reflect.

<div class="p"><!----></div>
<a 
id="tzmsk21957"></a><a 
id="t1mskc21958"></a><tt>tzmsk</tt> finds the lowest set bit in value from source operand, sets all bits
below it to 1 and all the rest of bits to zero, then writes the result to
destination. <tt>t1mskc</tt> finds the least significant zero bit in the value from
source  operand, sets the bits below it to zero and all the other bits to 1,
and writes the result to destination. These instructions have the same rules
for operands as <tt>blsi</tt>.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.24"></a><h3>





























































































































2.1.24&nbsp;&nbsp;Other extensions of instruction set</h3>

<div class="p"><!----></div>
There is a number of additional instruction set extensions recognized by flat
assembler, and the general syntax of the instructions introduced by those
extensions is provided here. For a detailed information on the operations
performed by them, check out the manuals from Intel (for the VMX, SMX, XSAVE,
RDRAND, FSGSBASE, INVPCID, HLE and RTM extensions) or AMD (for the SVM extension).

<div class="p"><!----></div>
<a 
id="vmxon21959"></a><a 

id="vmxoff21960"></a><a 
id="vmlaunch21961"></a><a 
id="vmresume21962"></a><a 

id="vmcall21963"></a>The Virtual-Machine Extensions (VMX) provide a set of instructions for the
management of virtual machines. The <tt>vmxon</tt> instruction, which enters the VMX
operation, requires a single 64-bit memory operand, which should be a physical
address of memory region, which the logical processor may use to support VMX
operation. The <tt>vmxoff</tt> instruction, which leaves the VMX operation, has no
operands. The <tt>vmlaunch</tt> and <tt>vmresume</tt>, which launch or resume the virtual
machines, and <tt>vmcall</tt>, which allows guest software to call the VM monitor,
use no operands either.

<div class="p"><!----></div>
<a 
id="vmptrld21964"></a><a 
id="vmptrst21965"></a><a 

id="vmclear21966"></a>The <tt>vmptrld</tt> loads the physical address of current Virtual Machine Control
Structure (VMCS) from its memory operand, <tt>vmptrst</tt> stores the pointer to
current VMCS into address specified by its memory operand, and <tt>vmclear</tt> sets
the launch state of the VMCS referenced by its memory operand to clear. These
three instruction all require single 64-bit memory operand.

<div class="p"><!----></div>
<a 
id="vmread21967"></a><a 
id="vmwrite21968"></a>The <tt>vmread</tt> reads from VCMS a field specified by the source operand and
stores it into the destination operand. The source operand should be a
general purpose register, and the destination operand can be a register of
memory. The <tt>vmwrite</tt> writes into a VMCS field specified by the destination
operand the value provided by source operand. The source operand can be a
general purpose register or memory, and the destination operand must be a
register. The size of operands for those instructions should be 64-bit when
in long mode, and 32-bit otherwise.

<div class="p"><!----></div>
<a 
id="invept21969"></a><a 
id="invvpid21970"></a>The <tt>invept</tt> and <tt>invvpid</tt> invalidate the translation lookaside buffers
(TLBs) and paging-structure caches, either derived from extended page tables
(EPT), or based on the virtual processor identifier (VPID). These instructions
require two operands, the first one being the general purpose register
specifying the type of invalidation, and the second one being a 128-bit
memory operand providing the invalidation descriptor. The first operand
should be a 64-bit register when in long mode, and 32-bit register otherwise.

<div class="p"><!----></div>
<a 
id="getsec21971"></a>The Safer Mode Extensions (SMX) provide the functionalities available
throught the <tt>getsec</tt> instruction. This instruction takes no operands, and
the function that is executed is determined by the contents of EAX register
upon executing this instruction.

<div class="p"><!----></div>
<a 
id="skinit21972"></a>The Secure Virtual Machine (SVM) is a variant of virtual machine extension
used by AMD. The <tt>skinit</tt> instruction securely reinitializes the processor
allowing the startup of trusted software, such as the virtual machine monitor
(VMM). This instruction takes a single operand, which must be EAX, and
provides a physical address of the secure loader block (SLB).

<div class="p"><!----></div>
<a 
id="vmrun21973"></a><a 
id="vmsave21974"></a><a 
id="vmload21975"></a>The <tt>vmrun</tt> instruction is used to start a guest virtual machine,
its only operand should be an accumulator register (AX, EAX or RAX, the
last one available only in long mode) providing the physical address of the
virtual machine control block (VMCB). The <tt>vmsave</tt> stores a subset of
processor state into VMCB specified by its operand, and <tt>vmload</tt> loads the
same subset of processor state from a specified VMCB. The same operand rules
as for the <tt>vmrun</tt> apply to those two instructions.

<div class="p"><!----></div>
<a 
id="vmmcall21976"></a><tt>vmmcall</tt> allows the guest software to call the VMM. This instruction takes
no operands.

<div class="p"><!----></div>
<a 
id="stgi21977"></a><a 
id="clgi21978"></a><tt>stgi</tt> set the global interrupt flag to 1, and <tt>clgi</tt> zeroes it. These
instructions take no operands.

<div class="p"><!----></div>
<a 
id="invlpga21979"></a><tt>invlpga</tt> invalidates the TLB mapping for a virtual page specified by the
first operand (which has to be accumulator register) and address space
identifier specified by the second operand (which must be ECX register).

<div class="p"><!----></div>
<a 
id="xsave21980"></a><a 
id="xsaveopt21981"></a><a 
id="xrstor21982"></a><a 
id="xsave6421983"></a><a 
id="xsaveopt6421984"></a><a 

id="xrstor6421985"></a>The XSAVE set of instructions allows to save and restore processor state
components. <tt>xsave</tt> and <tt>xsaveopt</tt> store the components of processor state
defined by bit mask in EDX and EAX registers into area defined by memory
operand. <tt>xrstor</tt> restores from the area specified by memory operand the
components of processor state defined by mask in EDX and EAX. The <tt>xsave64</tt>,
<tt>xsaveopt64</tt> and <tt>xrstor64</tt> are 64-bit versions of these instructions, allowed
only in long mode.

<div class="p"><!----></div>
<a 
id="xgetbv21986"></a><a 
id="xsetbv21987"></a><tt>xgetbv</tt> read the contents of 64-bit XCR (extended control register)
specified in ECX register into EDX and EAX registers. <tt>xsetbv</tt> writes the
contents of EDX and EAX into the 64-bit XCR specified by ECX register. These
instructions have no operands.

<div class="p"><!----></div>
<a 
id="rdrand21988"></a>The RDRAND extension introduces one new instruction, <tt>rdrand</tt>, which loads
the hardware-generated random value into general register. It takes one
operand, which can be 16-bit, 32-bit or 64-bit register (with the last one
being allowed only in long mode).

<div class="p"><!----></div>
<a 
id="rdfsbase21989"></a><a 
id="rdgsbase21990"></a><a 
id="wrfsbase21991"></a><a 

id="wrgsbase21992"></a>The FSGSBASE extension adds long mode instructions that allow to read and
write the segment base registers for FS and GS segments. <tt>rdfsbase</tt> and
<tt>rdgsbase</tt> read the corresponding segment base registers into operand, while
<tt>wrfsbase</tt> and <tt>wrgsbase</tt> write the value of operand into those register.
All these instructions take one operand, which can be 32-bit or 64-bit general
register.

<div class="p"><!----></div>
<a 
id="invpcid21993"></a>The INVPCID extension adds <tt>invpcid</tt> instruction, which invalidates mapping
in the TLBs and paging caches based on the invalidation type specified in
first operand and PCID invalidate descriptor specified in second operand.
The first operands should be 32-bit general register when not in long mode,
or 64-bit general register when in long mode. The second operand should be
128-bit memory location.

<div class="p"><!----></div>
<a 
id="xacquire21994"></a><a 
id="xrelease21995"></a><a 
id="xbegin21996"></a><a 

id="xend21997"></a><a 
id="xabort21998"></a><a 
id="xtest21999"></a>The HLE and RTM extensions provide set of instructions for the transactional
management. The <tt>xacquire</tt> and <tt>xrelease</tt> are new prefixes that can be used
with some of the instructions to start or end lock elision on the memory
address specified by prefixed instruction. The <tt>xbegin</tt> instruction starts
the transactional execution, its operand is the address a fallback routine
that gets executes in case of transaction abort, specified like the operand
for near jump instruction. <tt>xend</tt> marks the end of transcational execution
region, it takes no operands. <tt>xabort</tt> forces the transaction abort, it takes
an 8-bit immediate value as its only operand, this value is passed in the
highest bits of EAX to the fallback routine. <tt>xtest</tt> checks whether there is
transactional execution in progress, this instruction takes no operands.





























<div class="p"><!----></div>
 <a id="tth_sEc2.2"></a><h2>
2.2&nbsp;&nbsp;Control directives</h2>
<a id="sec:control">
</a>
This section describes the directives that control the assembly process, they
are processed during the assembly and may cause some blocks of instructions
to be assembled differently or not assembled at all.

<div class="p"><!----></div>
     <a id="tth_sEc2.2.1"></a><h3>
2.2.1&nbsp;&nbsp;Numerical constants</h3>
<a 
id="_221000"></a>
The <tt>=</tt> directive allows to define the numerical constant. It should be preceded by
the name for the constant and followed by the numerical expression providing the value.
The value of such constants can be a number or an address, but - unlike labels - the
numerical constants are not allowed to hold the register-based addresses.
Besides this difference, in their basic variant numerical constants behave
very much like labels and you can even forward-reference them (access their
values before they actually get defined).
................................................................................
</pre>
which declares label placed at <tt>ebp+4</tt> address. However remember that labels,
unlike numerical constants, cannot become assembly-time variables.

<div class="p"><!----></div>
     <a id="tth_sEc2.2.2"></a><h3>
2.2.2&nbsp;&nbsp;Conditional assembly</h3>
<a 
id="IF221001"></a>
<tt>if</tt> directive causes some block of instructions to be assembled only under
certain condition. It should be followed by logical expression specifying the
condition, instructions in next lines will be assembled only when this
condition is met, otherwise they will be skipped. The optional <tt>else&nbsp;if</tt>
directive followed with logical expression specifying additional condition
begins the next block of instructions that will be assembled if previous
conditions were not met, and the additional condition is met. The optional
................................................................................
<tt>eax,16&nbsp;eqtype&nbsp;fs,3+7</tt> condition is true, but <tt>eax,16&nbsp;eqtype&nbsp;eax,1.6</tt> is false.

<div class="p"><!----></div>
     <a id="tth_sEc2.2.3"></a><h3>
2.2.3&nbsp;&nbsp;Repeating blocks of instructions</h3>
<a id="sec:repeating">
</a>
<a 
id="TIMES221002"></a>
<tt>times</tt> directive repeats one instruction specified number of times. It
should be followed by numerical expression specifying number of repeats and
the instruction to repeat (optionally colon can be used to separate number and
instruction). When special symbol <tt>%</tt> is used inside the instruction, it is
equal to the number of current repeat. For example <tt>times&nbsp;5&nbsp;db&nbsp;%</tt> will define
five bytes with values 1, 2, 3, 4, 5. Recursive use of <tt>times</tt> directive is
also allowed, so <tt>times&nbsp;3&nbsp;times&nbsp;%&nbsp;db&nbsp;%</tt> will define six bytes with values
1, 1, 2, 1, 2, 3.

<div class="p"><!----></div>
<a 
id="REPEAT221003"></a><tt>repeat</tt> directive repeats the whole block of instructions. It should be
followed by numerical expression specifying number of repeats. Instructions
to repeat are expected in next lines, ended with the <tt>end&nbsp;repeat</tt> directive,
for example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;repeat&nbsp;8
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mov&nbsp;byte&nbsp;[bx],%
................................................................................
addressed by BX register.

<div class="p"><!----></div>
Number of repeats can be zero, in that case the instructions are not
assembled at all.

<div class="p"><!----></div>
<a 
id="BREAK221004"></a>The <tt>break</tt> directive allows to stop repeating earlier and continue assembly
from the first line after the <tt>end&nbsp;repeat</tt>. Combined with the <tt>if</tt> directive it
allows to stop repeating under some special condition, like:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;s&nbsp;=&nbsp;x/2
&nbsp;&nbsp;&nbsp;&nbsp;repeat&nbsp;100
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if&nbsp;x/s&nbsp;=&nbsp;s
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end&nbsp;if
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s&nbsp;=&nbsp;(s+x/s)/2
&nbsp;&nbsp;&nbsp;&nbsp;end&nbsp;repeat

</pre>

<div class="p"><!----></div>
<a 
id="WHILE221005"></a>The <tt>while</tt> directive repeats the block of instructions as long as the
condition specified by the logical expression following it is true. The block
of instructions to be repeated should end with the <tt>end&nbsp;while</tt> directive.
Before each repetition the logical expression is evaluated and when its value
is false, the assembly is continued starting from the first line after the
<tt>end&nbsp;while</tt>. Also in this case the <tt>%</tt> symbol holds the number of current
repeat. The <tt>break</tt> directive can be used to stop this kind of loop in the same
way as with <tt>repeat</tt> directive. The previous sample can be rewritten to use the
................................................................................
however they should be closed in the same order in which they were started. The
<tt>break</tt> directive always stops processing the block that was started last with
either the <tt>repeat</tt> or <tt>while</tt> directive.

<div class="p"><!----></div>
     <a id="tth_sEc2.2.4"></a><h3>
2.2.4&nbsp;&nbsp;Addressing spaces</h3>
<a 
id="ORG221006"></a>
<tt>org</tt> directive sets address at which the following code is expected to
appear in memory. It should be followed by numerical expression specifying
the address. This directive begins the new addressing space, the following
code itself is not moved in any way, but all the labels defined within it
and the value of <tt>$</tt> symbol are affected as if it was put at the given
address. However it's the responsibility of programmer to put the code at
correct address at run-time.

<div class="p"><!----></div>
<a 
id="LOAD221007"></a>The <tt>load</tt> directive allows to define constant with a binary value loaded
from the already assembled code. This directive should be followed by the name
of the constant, then optionally size operator, then <tt>from</tt> operator and a
numerical expression specifying a valid address in current addressing space.
The size operator has unusual meaning in this case - it states how many bytes
(up to 8) have to be loaded to form the binary value of constant. If no size
operator is specified, one byte is loaded (thus value is in range from 0 to
255). The loaded data cannot exceed current offset.

<div class="p"><!----></div>
<a 
id="STORE221008"></a>The <tt>store</tt> directive can modify the already generated code by replacing
some of the previously generated data with the value defined by given
numerical expression, which follows. The expression can be preceded by the
optional size operator to specify how large value the expression defines, and
therefore how much bytes will be stored, if there is no size operator, the
size of one byte is assumed. Then the <tt>at</tt> operator and the numerical
expression defining the valid address in current addressing code space, at
which the given value have to be stored should follow. This is a directive for
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;store&nbsp;byte&nbsp;a&nbsp;xor&nbsp;c&nbsp;at&nbsp;$$+%-1
&nbsp;&nbsp;&nbsp;&nbsp;end&nbsp;repeat

</pre>
and each byte of code will be xored with the value defined by <tt>c</tt> constant.

<div class="p"><!----></div>
<a 
id="VIRTUAL221009"></a><tt>virtual</tt> defines virtual data at specified address. This data will not be
included in the output file, but labels defined there can be used in other
parts of source. This directive can be followed by <tt>at</tt> operator and the
numerical expression specifying the address for virtual data, otherwise is
uses current address, the same as <tt>virtual&nbsp;at&nbsp;$</tt>. Instructions defining data
are expected in next lines, ended with <tt>end&nbsp;virtual</tt> directive. The block of
virtual instructions itself is an independent addressing space, after it's
ended, the context of previous addressing space is restored.
................................................................................
limited by the boundaries of the block.                

<div class="p"><!----></div>
     <a id="tth_sEc2.2.5"></a><h3>
2.2.5&nbsp;&nbsp;Other directives</h3>
<a id="sec:other">
</a>
<a 
id="ALIGN221010"></a>
<tt>align</tt> directive aligns code or data to the specified boundary. It should
be followed by a numerical expression specifying the number of bytes, to the
multiply of which the current address has to be aligned. The boundary value
has to be the power of two.

<div class="p"><!----></div>
The <tt>align</tt> directive fills the bytes that had to be skipped to perform the
................................................................................

</pre>
The <tt>a</tt> constant is defined to be the difference between address after alignment
and address of the <tt>virtual</tt> block (see previous section), so it is equal to
the size of needed alignment space.

<div class="p"><!----></div>
<a 
id="DISPLAY221011"></a><tt>display</tt> directive displays the message at the assembly time. It should
be followed by the quoted strings or byte values, separated with commas. It
can be used to display values of some constants, for example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;bits&nbsp;=&nbsp;16
&nbsp;&nbsp;&nbsp;&nbsp;display&nbsp;'Current&nbsp;offset&nbsp;is&nbsp;0x'
&nbsp;&nbsp;&nbsp;&nbsp;repeat&nbsp;bits/4
................................................................................
All preprocessor directives are processed before the main assembly process,
and therefore are not affected by the control directives. At this time also
all comments are stripped out.

<div class="p"><!----></div>
     <a id="tth_sEc2.3.1"></a><h3>
2.3.1&nbsp;&nbsp;Including source files</h3>
<a 
id="INCLUDE231012"></a>
<tt>include</tt> directive includes the specified source file at the position
where it is used. It should be followed by the quoted name of file that
should be included, for example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;include&nbsp;'macros.inc'

................................................................................
<a id="sec:symbolic_constants">
</a>
The symbolic constants are different from the numerical constants, before the
assembly process they are replaced with their values everywhere in source
lines after their definitions, and anything can become their values.

<div class="p"><!----></div>
<a 
id="EQU231013"></a>The definition of symbolic constant consists of name of the constant followed
by the <tt>equ</tt> directive. Everything that follows this directive will
become the value of constant. If the value of symbolic constant contains
other symbolic constants, they are replaced with their values before
assigning this value to the new constant. For example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;d&nbsp;equ&nbsp;dword
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;d&nbsp;equ&nbsp;d,eax

</pre>
the <tt>d</tt> constant would get the new value of <tt>edx,eax</tt>. This way the growing
lists of symbols can be defined.

<div class="p"><!----></div>
<a 
id="RESTORE231014"></a><tt>restore</tt> directive allows to get back previous value of redefined symbolic
constant. It should be followed by one more names of symbolic constants,
separated with commas. So <tt>restore&nbsp;d</tt> after the above definitions will give
<tt>d</tt> constant back the value <tt>edx</tt>, the second one will restore it to value
<tt>dword</tt>, and one more will revert <tt>d</tt> to original meaning as if no such
constant was defined. If there was no constant defined of given name,
<tt>restore</tt> will not cause an error, it will be just ignored.

................................................................................

</pre>
After this definition <tt>mov&nbsp;ax,offset&nbsp;char</tt> will be valid construction
for copying the offset of <tt>char</tt> variable into <tt>ax</tt> register,
because <tt>offset</tt> is replaced with an empty value, and therefore ignored.

<div class="p"><!----></div>
<a 
id="DEFINE231015"></a>The <tt>define</tt> directive followed by the name of constant and then the value,
is the alternative way of defining symbolic constant. The only difference
between <tt>define</tt> and <tt>equ</tt> is that <tt>define</tt> assigns the value as it is, it does
not replace the symbolic constants with their values inside it.

<div class="p"><!----></div>
<a 
id="FIX231016"></a>Symbolic constants can also be defined with the <tt>fix</tt> directive, which has
the same syntax as <tt>equ</tt>, but defines constants of high priority - they are
replaced with their symbolic values even before processing the preprocessor
directives and macroinstructions, the only exception is <tt>fix</tt> directive
itself, which has the highest possible priority, so it allows redefinition of
constants defined this way.

<div class="p"><!----></div>
................................................................................
with <tt>equ</tt> directive wouldn't give such result, as standard symbolic constants
are replaced with their values after searching the line for preprocessor
directives.

<div class="p"><!----></div>
     <a id="tth_sEc2.3.3"></a><h3>
2.3.3&nbsp;&nbsp;Macroinstructions</h3>
<a 
id="MACRO231017"></a>
<tt>macro</tt> directive allows you to define your own complex instructions,
called macroinstructions, using which can greatly simplify the process of
programming. In its simplest form it's similar to symbolic constant
definition. For example the following definition defines a shortcut for the
<tt>test&nbsp;al,0xFF</tt> instruction:

<pre>
................................................................................
<div class="p"><!----></div>
When it's needed to provide macroinstruction with argument that contains
some commas, such argument should be enclosed between <tt>&lt;</tt> and <tt>&gt;</tt>
characters. If it contains more than one <tt>&lt;</tt> character, the same number
of <tt>&gt;</tt> should be used to tell that the value of argument ends.

<div class="p"><!----></div>
<a 





id="PURGE231018"></a><tt>purge</tt> directive allows removing the last definition of specified
macroinstruction. It should be followed by one or more names of
macroinstructions, separated with commas. If such macroinstruction has not
been defined, you will not get any error. For example after having the syntax of
<tt>mov</tt> extended with the macroinstructions defined above, you can disable
syntax with three operands back by using <tt>purge&nbsp;mov</tt> directive. Next
<tt>purge&nbsp;mov</tt> will disable also syntax for two operands being segment
registers, and all the next such directives will do nothing.

<div class="p"><!----></div>
If after the <tt>macro</tt> directive you enclose some group of arguments'
names in square brackets, it will allow giving more values for this group of
arguments when using that macroinstruction. Any more argument given after the
last argument of such group will begin the new group and will become the
first argument of it. That's why after closing the square bracket no more
argument names can follow. The contents of macroinstruction will be processed
for each such group of arguments separately. The simplest example is to
enclose one argument name in square brackets:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;macro&nbsp;stoschar&nbsp;[char]
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;stosb
&nbsp;&nbsp;&nbsp;&nbsp;mov&nbsp;al,3
&nbsp;&nbsp;&nbsp;&nbsp;stosb

</pre>

<div class="p"><!----></div>
<a 
id="LOCAL231019"></a>There are some special directives available only inside the definitions of
macroinstructions. <tt>local</tt> directive defines local names, which will be
replaced with unique values each time the macroinstruction is used. It should
be followed by names separated with commas. If the name given as parameter to <tt>local</tt> directive begins with a dot or two
dots, the unique labels generated by each evaluation of macroinstruction will
have the same properties. This directive is usually needed
for the constants or labels that macroinstruction defines and uses
internally.
................................................................................

</pre>
Each time this macroinstruction is used, <tt>move</tt> will become other
unique name in its instructions, so you will not get an error you normally get
when some label is defined more than once.

<div class="p"><!----></div>
<a 
id="FORWARD231020"></a><a 
id="REVERSE231021"></a><a 

id="COMMON231022"></a><tt>forward</tt>, <tt>reverse</tt> and <tt>common</tt> directives divide
macroinstruction into blocks, each one processed after the processing of
previous is finished. They differ in behavior only if macroinstruction allows
multiple groups of arguments. Block of instructions that follows
<tt>forward</tt> directive is processed for each group of arguments, from
first to last - exactly like the default block (not preceded by any of these
directives). Block that follows <tt>reverse</tt> directive is processed
for each group of argument in reverse order - from last to first. Block that
................................................................................
</pre>
It is a very simplified kind of macroinstruction and it simply delegates a
block of instructions to be put at the end. 

<div class="p"><!----></div>
     <a id="tth_sEc2.3.4"></a><h3>
2.3.4&nbsp;&nbsp;Structures</h3>
<a 
id="STRUC231023"></a>
<tt>struc</tt> directive is a special variant of <tt>macro</tt> directive that is
used to define data structures. Macroinstruction defined using the
<tt>struc</tt> directive must be preceded by a label (like the data definition
directive) when it's used. This label will be also attached at the beginning
of every name starting with dot in the contents of macroinstruction. The
macroinstruction defined using the <tt>struc</tt> directive can have the same
name as some other macroinstruction defined using the <tt>macro</tt> directive,
................................................................................

<div class="p"><!----></div>
Defining data structures addressed by registers or absolute values should be
done using the <tt>virtual</tt> directive with structure macroinstruction
(see <a href="#sec:other">2.2.5</a>).

<div class="p"><!----></div>
<a 
id="RESTRUC231024"></a><tt>restruc</tt> directive removes the last definition of the structure, just like
<tt>purge</tt> does with macroinstructions and <tt>restore</tt> with symbolic constants.
It also has the same syntax - should be followed by one or more names of
structure macroinstructions, separated with commas.

<div class="p"><!----></div>
     <a id="tth_sEc2.3.5"></a><h3>
2.3.5&nbsp;&nbsp;Repeating macroinstructions</h3>
<a 
id="REPT231025"></a>
The <tt>rept</tt> directive is a special kind of macroinstruction, which makes given
amount of duplicates of the block enclosed with braces. The basic syntax is
<tt>rept</tt> directive followed by number and then block of source enclosed between
the <tt>{</tt> and <tt>}</tt> characters. The simplest example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;rept&nbsp;5&nbsp;{&nbsp;in&nbsp;al,dx&nbsp;}
................................................................................
of expression associated with symbolic constant is calculated first, and then
substituted into the outer expression in place of that constant). If you need
repetitions based on values that can only be calculated at assembly time, use
one of the code repeating directives that are processed by assembler, see
section <a href="#sec:repeating">2.2.3</a>.

<div class="p"><!----></div>
<a 
id="IRP231026"></a>The <tt>irp</tt> directive iterates the single argument through the given list of
parameters. The syntax is <tt>irp</tt> followed by the argument name, then the comma
and then the list of parameters. The parameters are specified in the same
way like in the invocation of standard macroinstruction, so they have to be
separated with commas and each one can be enclosed with the <tt>&lt;</tt> and <tt>&gt;</tt>
characters. Also the name of argument may be followed by <tt>*</tt> to mark that it
cannot get an empty value. Such block:

................................................................................

<pre>
&nbsp;&nbsp;&nbsp;db&nbsp;2
&nbsp;&nbsp;&nbsp;db&nbsp;3
&nbsp;&nbsp;&nbsp;db&nbsp;5

</pre>
<a 
id="IRPS231027"></a>
The <tt>irps</tt> directive iterates through the given list of symbols, it should
be followed by the argument name, then the comma and then the sequence of any
symbols. Each symbol in this sequence, no matter whether it is the name
symbol, symbol character or quoted string, becomes an argument value for one
iteration. If there are no symbols following the comma, no iteration is done
at all. This example:

................................................................................

<pre>
&nbsp;&nbsp;&nbsp;xor&nbsp;al,al
&nbsp;&nbsp;&nbsp;xor&nbsp;bx,bx
&nbsp;&nbsp;&nbsp;xor&nbsp;ecx,ecx

</pre>
























The blocks defined by the <tt>irp</tt> and <tt>irps</tt> directives are also processed in
the same way as any macroinstructions, so operators and directives specific
to macroinstructions may be freely used also in this case.

<div class="p"><!----></div>
     <a id="tth_sEc2.3.6"></a><h3>
2.3.6&nbsp;&nbsp;Conditional preprocessing</h3>
<a id="sec:conditional_preprocessing">
</a>
<a 
id="MATCH231028"></a>
<tt>match</tt> directive causes some block of source to be preprocessed and passed
to assembler only when the given sequence of symbols matches the specified
pattern. The pattern comes first, ended with comma, then the symbols
that have to be matched with the pattern, and finally the block of
source, enclosed within braces as macroinstruction.

<div class="p"><!----></div>
................................................................................
on.

<div class="p"><!----></div>
 <a id="tth_sEc2.4"></a><h2>
2.4&nbsp;&nbsp;Formatter directives</h2>
<a id="sec:formatter">
</a>
<a 
id="FORMAT241029"></a>
These directives are actually also a kind of control directives, with the
purpose of controlling the format of generated code.

<div class="p"><!----></div>
<tt>format</tt> directive followed by the format identifier allows to select
the output format. This directive should be put at the beginning of the
source. Default output format is a flat binary file, it can also be selected
................................................................................
by using <tt>format&nbsp;binary</tt> directive.
This directive can be followed by the <tt>as</tt> keyword
and the quoted string specifying the default file extension for the output
file. Unless the output file name was specified from the command line,
assembler will use this extension when generating the output file.

<div class="p"><!----></div>
<a 
id="USE16__USE32__USE64241030"></a><tt>use16</tt> and <tt>use32</tt> directives force the assembler to generate 16-bit or
32-bit code, omitting the default setting for selected output format. <tt>use64</tt>
enables generating the code for the long mode of x86-64 processors.

<div class="p"><!----></div>
Below are described different output formats with the directives
specific to these formats.

................................................................................
<div class="p"><!----></div>
     <a id="tth_sEc2.4.1"></a><h3>
2.4.1&nbsp;&nbsp;MZ executable</h3>
To select the MZ output format, use <tt>format&nbsp;MZ</tt> directive. The default
code setting for this format is 16-bit.

<div class="p"><!----></div>
<a 
id="SEGMENT241031"></a><tt>segment</tt> directive defines a new segment, it should be followed by
label, which value will be the number of defined segment, optionally
<tt>use16</tt> or <tt>use32</tt> word can follow to specify whether code in this
segment should be 16-bit or 32-bit. The origin of segment is aligned to
paragraph (16 bytes). All the labels defined then will have values relative
to the beginning of this segment.

<div class="p"><!----></div>
<a 
id="ENTRY241032"></a><tt>entry</tt> directive sets the entry point for MZ executable, it should be
followed by the far address (name of segment, colon and the offset inside
segment) of desired entry point.

<div class="p"><!----></div>
<a 
id="STACK241033"></a><tt>stack</tt> directive sets up the stack for MZ executable. It can be
followed by numerical expression specifying the size of stack to be created
automatically or by the far address of initial stack frame when you want to
set up the stack manually. When no stack is defined, the stack of default
size 4096 bytes will be created.

<div class="p"><!----></div>
<a 
id="HEAP241034"></a><tt>heap</tt> directive should be followed by a 16-bit value defining maximum
size of additional heap in paragraphs (this is heap in addition to stack and
undefined data). Use <tt>heap&nbsp;0</tt> to always allocate only memory program
really needs. Default size of heap is 65535.

<div class="p"><!----></div>
     <a id="tth_sEc2.4.2"></a><h3>
2.4.2&nbsp;&nbsp;Portable Executable</h3>
................................................................................
To select the Portable Executable output format, use <tt>format&nbsp;PE</tt> directive,
it can be followed by additional format settings: first the target subsystem
setting, which can be <tt>console</tt> or <tt>GUI</tt> for Windows applications, <tt>native</tt>
for Windows drivers, <tt>EFI</tt>, <tt>EFIboot</tt> or <tt>EFIruntime</tt> for the UEFI, it may be
followed by the minimum version of system that the executable is targeted to
(specified in form of floating-point value). Optional <tt>DLL</tt> and <tt>WDM</tt> keywords
mark the output file as a dynamic link library and WDM driver respectively,
and the <tt>large</tt> keyword marks the executable as able to handle addresses
larger than 2 GB.


<div class="p"><!----></div>
After those settings can follow the <tt>at</tt> operator and the numerical expression
specifying the base of PE image and then optionally <tt>on</tt> operator followed by
the quoted string containing file name selects custom MZ stub for PE program
(when specified file is not a MZ executable, it is treated as a flat binary
executable file and converted into MZ format). The default code setting for
................................................................................

<div class="p"><!----></div>
To create PE file for the x86-64 architecture, use <tt>PE64</tt> keyword instead of
<tt>PE</tt> in the format declaration, in such case the long mode code is generated
by default.

<div class="p"><!----></div>
<a 
id="SECTION241035"></a><tt>section</tt> directive defines a new section, it should be
followed by quoted string defining the name of section, then one
or more section flags can follow. Available flags are:
<tt>code</tt>, <tt>data</tt>, <tt>readable</tt>, <tt>writeable</tt>,
<tt>executable</tt>, <tt>shareable</tt>, <tt>discardable</tt>,
<tt>notpageable</tt>. The origin of section is aligned to page (4096
bytes). Example declaration of PE section:

................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;section&nbsp;'.reloc'&nbsp;data&nbsp;readable&nbsp;discardable&nbsp;fixups
&nbsp;&nbsp;&nbsp;&nbsp;section&nbsp;'.rsrc'&nbsp;data&nbsp;readable&nbsp;resource&nbsp;from&nbsp;'my.res'

</pre>

<div class="p"><!----></div>
<a 
id="ENTRY241036"></a><tt>entry</tt> directive sets the entry point for Portable Executable, the
value of entry point should follow.

<div class="p"><!----></div>
<a 
id="STACK241037"></a><tt>stack</tt> directive sets up the size of stack for Portable Executable,
value of stack reserve size should follow, optionally value of stack commit
separated with comma can follow. When stack is not defined, it's set by
default to size of 4096 bytes.

<div class="p"><!----></div>
<a 
id="HEAP241038"></a><tt>heap</tt> directive chooses the size of heap for Portable Executable, value
of heap reserve size should follow, optionally value of heap commit separated
with comma can follow. When no heap is defined, it is set by default to size
of 65536 bytes, when size of heap commit is unspecified, it is by default set
to zero.

<div class="p"><!----></div>
<a 
id="DATA241039"></a><a 
id="END241040"></a><tt>data</tt> directive begins the definition of special PE data, it should be
followed by one of the data identifiers (<tt>export</tt>, <tt>import</tt>,
<tt>resource</tt> or <tt>fixups</tt>) or by the number of data entry in PE
header. The data should be defined in next lines, ended with <tt>end&nbsp;data</tt>
directive. When fixups data definition is chosen, they are generated
automatically and no more data needs to be defined there.
The same applies to the resource data when the <tt>resource</tt>
identifier is followed by <tt>from</tt> operator and quoted file name -
................................................................................
directive, depending whether you want to create classic (DJGPP) or Microsoft's
variant of COFF file. The default code setting for this format is 32-bit. To
create the file in Microsoft's COFF format for the x86-64 architecture, use
<tt>format&nbsp;MS64&nbsp;COFF</tt> setting, in such case long mode code is generated by
default.

<div class="p"><!----></div>
<a 
id="SECTION241041"></a><tt>section</tt> directive defines a new section, it should be followed by
quoted string defining the name of section, then one or more section flags
can follow. Section flags available for both COFF variants are <tt>code</tt> and <tt>data</tt>,
while flags <tt>readable</tt>, <tt>writeable</tt>, <tt>executable</tt>, <tt>shareable</tt>, <tt>discardable</tt>,
<tt>notpageable</tt>, <tt>linkremove</tt> and <tt>linkinfo</tt> are available only with
Microsoft's COFF variant.

<div class="p"><!----></div>
By default section is aligned
to double word (four bytes), in case of Microsoft COFF variant other alignment
can be specified by providing the <tt>align</tt> operator followed by alignment value
(any power of two up to 8192) among the section flags.

<div class="p"><!----></div>
<a 
id="EXTRN241042"></a><tt>extrn</tt> directive defines the external symbol, it should be
followed by the name of symbol and optionally the size operator
specifying the size of data labeled by this symbol. The name of
symbol can be also preceded by quoted string containing name of
the external symbol and the <tt>as</tt> operator. Some example
declarations of external symbols:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;extrn&nbsp;exit
&nbsp;&nbsp;&nbsp;&nbsp;extrn&nbsp;'__imp__MessageBoxA@16'&nbsp;as&nbsp;MessageBox:dword

</pre>

<div class="p"><!----></div>
<a 
id="PUBLIC241043"></a><tt>public</tt> directive declares the existing symbol as public, it
should be followed by the name of symbol, optionally it can be
followed by the <tt>as</tt> operator and the quoted string
containing name under which symbol should be available as public.
Some examples of public symbols declarations:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;public&nbsp;main
................................................................................
2.4.4&nbsp;&nbsp;Executable and Linkable Format</h3>
To select ELF output format, use <tt>format&nbsp;ELF</tt> directive. The default code
setting for this format is 32-bit. To create ELF file for the x86-64
architecture, use <tt>format&nbsp;ELF64</tt> directive, in such case the long mode code is
generated by default.

<div class="p"><!----></div>
<a 
id="SECTION241044"></a><tt>section</tt> directive defines a new section, it should be followed by quoted
string defining the name of section, then can follow one or both of the
<tt>executable</tt> and <tt>writeable</tt> flags, optionally also <tt>align</tt> operator
followed by the number specifying the alignment of section (it has to be the power of
two), if no alignment is specified, the default value is used, which is 4 or 8,
depending on which format variant has been chosen.

<div class="p"><!----></div>
<a 
id="EXTRN241045"></a><a 
id="PUBLIC241046"></a><tt>extrn</tt> and <tt>public</tt> directives have the same meaning and syntax as
when the COFF output format is selected (described in previous section).

<div class="p"><!----></div>
The  <tt>rva</tt> operator can be used also in the case of this format (however not
when target architecture is x86-64), it converts the address into the offset
relative to the GOT table, so it may be useful to create position-independent
code. There's also a special <tt>plt</tt> operator, which allows to call the external
................................................................................
&nbsp;&nbsp;.repeat
&nbsp;&nbsp;&nbsp;&nbsp;add&nbsp;ecx,2
&nbsp;&nbsp;.until&nbsp;ecx&#62;100

</pre>

<div class="p"><!----></div>

<br /><br /><hr /><small>File translated from
T<sub><span class="small">E</span></sub>X
by <a href="http://hutchinson.belmont.ma.us/tth/">
T<sub><span class="small">T</span></sub>H</a>,
version 4.03.<br />On 27 Feb 2014, 16:05.</small>
</div></body></html>
<
<
<
<
<
<
<
<
|
<
<
<
<
|
|
|
<
<
<
<
<
<
<
|
|
|
>
|
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
|
|
|
|
|







 







|







 







|






>












|







 







|







 







|







 







|







 







<










|







 







>
>
>
>







 







|








|









|












|







 







|







 







|










|
>
>
>







 







|
|
|
|
|
|
|
|
|
<
>
|
|
|
|
|
|
<
>







 







|
|







 







|







 







|
|







 







|
|







 







|
|





|



|
<
<
<
<







|












|
>
>



|







 







|
|







 







|
|











|
|
|
|







 







|
|
|
|











|
|
|
|







 







|
|
|
|







 







|
|
|





|
|
|




|
|
|







 







|
|







 







|
|





|
|









|
|





|
|






|
|



|
|





|
|





|
|
|
|










|
|











|
|







 







|
|













|
|







 







|
|






|
|






|
|






|
|






|
|









|
|











|
|





|
|
|
|




|
|
|
|
|







 







|
|
|







 







|
|







 







|
|
|








|
|







 







|
|





|
|
|









|
|
|









|
|




|
|







 







|
|







 







|
|






|
|
|
|
|
|
|
|
|
|







 







|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|







 







|







 







|
|
|
|
|
<
|
|
|
|
|
|
|
|
|
>
|







 







|
|
|





|
|
|
|








|
|







 







|
|













|
|







 







|
|
|
|
|







 







|
|
|
|
|







 







|
|
|
|
|













|
|
|
|
|











|
|
|
|
|




|
|
|
|
|












|
|
|
|
|












|
|
|
|
|
|







 







|
|
|
|
|
|
|
|






|
|




|
|



|
|
|
|





|
|
|
|









|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|











|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<
|
|
|
|
|
>
|
|












|
|













|
|







 







|
|




|
|




|
|







|
|
|
|
|
|







 







|
|









|
|





|
|




|
|
|







 







|
|
|












|
|
|
|
|











|
|
|
|
|








|
|












|
|




|
|
|







|
|







 







|
|



|
|







 







|
|




|
|





|
|
|









|
|




|
|







|
|




|
|
|







 







|
|












|
|
|
<
|
>
|
|
|






|
|









|
|
|












|
|







|
|






|
|













|
|











|
|









|
|
|
|
|
|
|
|
|











|
|
|
|
|
<
|
>
|




|




<
|
>
|
|
|
|
<
|
|
|
>
|







 







|
|




|
|
|












|
|




|
|
|
|




|
|
|












|
|
|
|
|





<
|
|
|
|
|
|
|
>
|







 







|







 







|
|
|





|
|
|





|
|
|






<
|
|
|
<
>
|
|
|
|
|
|
|
|
|
|
|
|
|
>
|







 







|
|
|
<
|
>
|
|








|
|



<
|
>
|
|







 







|
|










|
|








|
|
<
|
>
|
|
|
|
<
|
>
|
<
|
|
|
|
>
|
|
<
|
|
|
|
>
|
|
|
|
|
|
<
|
>
|
|
|
|
|
<
|
>



<
|
|
|
>




<
|
|
|
>




<
|
|
|
>






<
|
|
|
>



<
|
|
|
>











|
|
|
|
|
|
|
|
<
|
>
|
|
|
<
|
>











|
|












|
|
|












|
|
|












|
|
|






|
|





|
|







 







|
|
|
|
|
|
|
|
|
|
<
|
|
|
|
>
|
|
|
|







 







|
|
|
|
|
|
|
|
|
|
|
<
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
>
|







 







|







 







|
|
|




|
|







 







|
|
|








|
|










|
|










|
|
|












|
|
|












|
|










|
|











|
|
<
|
|
|
|
|
>
|











|
|





|
|







<
|
|
>
|












|
|
|
|
|





|
|



|
|
|



<
|
|
>
|







 







|
|
|





|
|
|





|
|





|
|





|
|
|
|
|
|
|
|
<
|
|
>
|
|
|
|











|
|
|
|
|
|
|
|
|
|
|
<
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
>
|









|
|
|




|
|
|








|
|
|





<
|
|
|
>
|







 







<
|
|
|
|
|
>
|







 







<
|
|
|
|
>
|











|
|




|
|
|







|
|
|










<
|
|
>
|












|





|
|
|
|
|





|
|
|





|
|
|







<
|
|
|
>
|







 







|
|



<
|
|
|
>
|












|
|




|
|




|
|




|
|




|
|





|
|




|
|
|



|
|






|
|







|
|



|
|










<
|
|
|
|
|
>
|





|
|



|
|
|
|




|
|






|
|





|
|




|
|
|
|







|
|













<
|
|
>
|
|
|
|
|
|
|
|
<
|
|
>
|
|
|
<
|
|
|
|
|
|
>
|












|
|




|
|
|
|
|







 







|
|
|






|
|







 







|







 







|
|
|
|







 







|
|
|
|
|
|







 







|
|



|
|
|




|
|



<
|
|
|
>
|





|
|
|







 







<
|
|
|
|
|
|
|
|
|
>
|











|
|
|








|
|




|
|




<
|
|
|
>
|









|
|
<
|
>
|







 







<
|
|
|
>
|







 







|
|
|








|
|







|
|











|
|
|
|







 







|
|
|
|
|







 







<
|
|
|
|
|
|
|
|
|
|
|
>
|







 







|
|









<
|
|
|
>
|











|
|





|
|







 







|
|











|
|






|
|







 







|
|







 







|
|
|







 







|







 







|
|



<
|
|
|
>
|









|
|
|
|
|
|
|
|
|
|
|
<
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
>
|
<
|
|
|
|
|
|
|
|
|
|
|
|
|
|
>
|







 







<
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
>
|







 







|
|







|
|
|










<
|
|
|
>
|







 







|
|
|












<
|
|
|
>
|












<
|
|
|
|
>
|







<
|
|
|
|
|
>
|













<
|
|
>
|







 







<
|
|
>
|













|
|
|
|













|
|








|
|






|
|
|













|
|
|








|
|












|
|
|




|
|
|







 







|







 







<
|
|
|
|
>
|







 







|
|
|




<
|
|
|
>
|









|
|
|
|
|












|
|
|






|
|
|







 







|
|











|
|











|
|











|
|





|
|
|




|
|
|







 







<
|
|
|
>
|








|
|
|




|
|
|





|
|





|
|
|







 







|
|
|










|







|
|
|











|




|




|
|
|











<
|
|
|
>
|







 







|
|







 







|







 







|
|
|







 







<
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
>
|







|
|
|
|
|











<
|
|
|
|
|
|
|
|
|
|
|
|
>
|
|







 







|
|









|
|
|
|
|













<
|
|
|
|
|
|
|
>
|












|
|







 







|
|










|
|












|
|









|
|
|





|
|
|





|
|




|
|
|







|
|












|
|
|
|





|
|











|
|
|








|
|
|
|
|
|





|
|
|
|
|
|
|
|
|
|
|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|
|
|
|
|
|
|
|
|
|
<
<
>
|
|
|
>
|









<
|
|
>
|






|
|
|









|
|
|








|
|





|
|






|
|
|
|








|
|



|
|
|



|
|




|
<
|
|
|
|
>
|








|
|
|





|
|





<
|
|
|
>
|







|
|







<
|
|
|
>
|
|
|










>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>













|
|







 







|
|







 







|
|










|
|







 







|
|







 







|
|







 







|
|









|
|









|
|







 







|
|







 







|
|







 







|
|







 







|
|







 







|
|







 







|
|







 







|
|





|
|







 







|
|







 







|
>
>
>
>
>
|









|
|
|
|
|







 







|
|







 







<
|
|
>
|







 







|
|







 







|
|







|
|







 







|
|







 







|
|







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|








|
|







 







|
|







 







|
|







 







|
|







|
|




|
|






|
|







 







|
|
>







 







|
|







 







|
|



|
|





|
|






|
|
|







 







|
|













|
|













|
|







 







|
|







|
|
|







 







<
<
<
<
<
<
<








1




2
3
4







5
6
7
8
9
























































































































































































































10
11
12
13
14
15
16
17
18
19
20
21
...
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
...
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
...
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
...
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
...
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
...
242
243
244
245
246
247
248

249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
...
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
...
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
...
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
...
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
...
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558

559
560
561
562
563
564
565

566
567
568
569
570
571
572
573
...
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
...
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
...
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
...
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
...
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742




743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
...
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
...
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
...
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
...
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
....
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
....
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
....
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
....
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
....
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
....
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
....
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
....
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
....
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
....
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
....
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
....
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
....
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
....
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581

1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
....
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
....
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
....
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
....
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
....
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
....
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977

1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
....
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
....
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
....
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
....
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
....
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
....
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330

2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448

2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460

2461
2462
2463
2464
2465
2466

2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
....
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549

2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
....
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
....
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619

2620
2621
2622

2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
....
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658

2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675

2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
....
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720

2721
2722
2723
2724
2725
2726

2727
2728
2729

2730
2731
2732
2733
2734
2735
2736

2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747

2748
2749
2750
2751
2752
2753
2754

2755
2756
2757
2758
2759

2760
2761
2762
2763
2764
2765
2766
2767

2768
2769
2770
2771
2772
2773
2774
2775

2776
2777
2778
2779
2780
2781
2782
2783
2784
2785

2786
2787
2788
2789
2790
2791
2792

2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815

2816
2817
2818
2819
2820

2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
2894
2895
2896
2897
2898
2899
2900
2901
2902
....
2911
2912
2913
2914
2915
2916
2917
2918
2919
2920
2921
2922
2923
2924
2925
2926
2927

2928
2929
2930
2931
2932
2933
2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
....
2946
2947
2948
2949
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
2962
2963

2964
2965
2966
2967
2968
2969
2970
2971
2972
2973
2974
2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
2985
2986
2987
....
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
....
3014
3015
3016
3017
3018
3019
3020
3021
3022
3023
3024
3025
3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
....
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
3056
3057
3058
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
3069
3070
3071
3072
3073
3074
3075
3076
3077
3078
3079
3080
3081
3082
3083
3084
3085
3086
3087
3088
3089
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
3108
3109
3110
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
3122
3123
3124
3125
3126
3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
3137
3138
3139

3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
3165
3166
3167
3168
3169
3170
3171
3172
3173

3174
3175
3176
3177
3178
3179
3180
3181
3182
3183
3184
3185
3186
3187
3188
3189
3190
3191
3192
3193
3194
3195
3196
3197
3198
3199
3200
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210

3211
3212
3213
3214
3215
3216
3217
3218
3219
3220
3221
....
3222
3223
3224
3225
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
3266

3267
3268
3269
3270
3271
3272
3273
3274
3275
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295

3296
3297
3298
3299
3300
3301
3302
3303
3304
3305
3306
3307
3308
3309
3310
3311
3312
3313
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323
3324
3325
3326
3327
3328
3329
3330
3331
3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347

3348
3349
3350
3351
3352
3353
3354
3355
3356
3357
3358
3359
....
3360
3361
3362
3363
3364
3365
3366

3367
3368
3369
3370
3371
3372
3373
3374
3375
3376
3377
3378
3379
3380
....
3384
3385
3386
3387
3388
3389
3390

3391
3392
3393
3394
3395
3396
3397
3398
3399
3400
3401
3402
3403
3404
3405
3406
3407
3408
3409
3410
3411
3412
3413
3414
3415
3416
3417
3418
3419
3420
3421
3422
3423
3424
3425
3426
3427
3428
3429
3430
3431
3432
3433
3434
3435
3436

3437
3438
3439
3440
3441
3442
3443
3444
3445
3446
3447
3448
3449
3450
3451
3452
3453
3454
3455
3456
3457
3458
3459
3460
3461
3462
3463
3464
3465
3466
3467
3468
3469
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
3484
3485
3486

3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
....
3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509
3510

3511
3512
3513
3514
3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541
3542
3543
3544
3545
3546
3547
3548
3549
3550
3551
3552
3553
3554
3555
3556
3557
3558
3559
3560
3561
3562
3563
3564
3565
3566
3567
3568
3569
3570
3571
3572
3573
3574
3575
3576
3577
3578
3579
3580
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601
3602
3603
3604

3605
3606
3607
3608
3609
3610
3611
3612
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642
3643
3644
3645
3646
3647
3648
3649
3650
3651
3652
3653
3654
3655
3656
3657
3658
3659
3660
3661
3662
3663
3664
3665
3666
3667
3668
3669
3670
3671
3672
3673
3674
3675
3676

3677
3678
3679
3680
3681
3682
3683
3684
3685
3686
3687

3688
3689
3690
3691
3692
3693

3694
3695
3696
3697
3698
3699
3700
3701
3702
3703
3704
3705
3706
3707
3708
3709
3710
3711
3712
3713
3714
3715
3716
3717
3718
3719
3720
3721
3722
3723
3724
3725
3726
3727
3728
3729
3730
3731
....
3740
3741
3742
3743
3744
3745
3746
3747
3748
3749
3750
3751
3752
3753
3754
3755
3756
3757
3758
3759
3760
3761
3762
3763
3764
....
3779
3780
3781
3782
3783
3784
3785
3786
3787
3788
3789
3790
3791
3792
3793
....
3850
3851
3852
3853
3854
3855
3856
3857
3858
3859
3860
3861
3862
3863
3864
3865
3866
3867
....
3885
3886
3887
3888
3889
3890
3891
3892
3893
3894
3895
3896
3897
3898
3899
3900
3901
3902
3903
3904
....
3907
3908
3909
3910
3911
3912
3913
3914
3915
3916
3917
3918
3919
3920
3921
3922
3923
3924
3925
3926
3927
3928
3929
3930

3931
3932
3933
3934
3935
3936
3937
3938
3939
3940
3941
3942
3943
3944
3945
3946
3947
3948
3949
3950
....
3957
3958
3959
3960
3961
3962
3963

3964
3965
3966
3967
3968
3969
3970
3971
3972
3973
3974
3975
3976
3977
3978
3979
3980
3981
3982
3983
3984
3985
3986
3987
3988
3989
3990
3991
3992
3993
3994
3995
3996
3997
3998
3999
4000
4001
4002
4003
4004
4005
4006
4007
4008

4009
4010
4011
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
4023
4024

4025
4026
4027
4028
4029
4030
4031
4032
4033
4034
....
4044
4045
4046
4047
4048
4049
4050

4051
4052
4053
4054
4055
4056
4057
4058
4059
4060
4061
4062
....
4064
4065
4066
4067
4068
4069
4070
4071
4072
4073
4074
4075
4076
4077
4078
4079
4080
4081
4082
4083
4084
4085
4086
4087
4088
4089
4090
4091
4092
4093
4094
4095
4096
4097
4098
4099
4100
4101
4102
4103
4104
4105
4106
4107
4108
4109
4110
4111
4112
4113
4114
....
4115
4116
4117
4118
4119
4120
4121
4122
4123
4124
4125
4126
4127
4128
4129
4130
4131
4132
4133
....
4134
4135
4136
4137
4138
4139
4140

4141
4142
4143
4144
4145
4146
4147
4148
4149
4150
4151
4152
4153
4154
4155
4156
4157
4158
4159
4160
....
4172
4173
4174
4175
4176
4177
4178
4179
4180
4181
4182
4183
4184
4185
4186
4187
4188
4189

4190
4191
4192
4193
4194
4195
4196
4197
4198
4199
4200
4201
4202
4203
4204
4205
4206
4207
4208
4209
4210
4211
4212
4213
4214
4215
4216
4217
4218
4219
4220
4221
....
4223
4224
4225
4226
4227
4228
4229
4230
4231
4232
4233
4234
4235
4236
4237
4238
4239
4240
4241
4242
4243
4244
4245
4246
4247
4248
4249
4250
4251
4252
4253
4254
4255
4256
4257
4258
4259
....
4260
4261
4262
4263
4264
4265
4266
4267
4268
4269
4270
4271
4272
4273
4274
4275
....
4277
4278
4279
4280
4281
4282
4283
4284
4285
4286
4287
4288
4289
4290
4291
4292
4293
....
4306
4307
4308
4309
4310
4311
4312
4313
4314
4315
4316
4317
4318
4319
4320
....
4342
4343
4344
4345
4346
4347
4348
4349
4350
4351
4352
4353

4354
4355
4356
4357
4358
4359
4360
4361
4362
4363
4364
4365
4366
4367
4368
4369
4370
4371
4372
4373
4374
4375
4376
4377
4378

4379
4380
4381
4382
4383
4384
4385
4386
4387
4388
4389
4390
4391
4392
4393
4394
4395
4396
4397
4398
4399

4400
4401
4402
4403
4404
4405
4406
4407
4408
4409
4410
4411
4412
4413
4414
4415
4416
4417
4418
4419
4420
4421
4422
....
4423
4424
4425
4426
4427
4428
4429

4430
4431
4432
4433
4434
4435
4436
4437
4438
4439
4440
4441
4442
4443
4444
4445
4446
4447
4448
4449
4450
4451
4452
4453
4454
....
4458
4459
4460
4461
4462
4463
4464
4465
4466
4467
4468
4469
4470
4471
4472
4473
4474
4475
4476
4477
4478
4479
4480
4481
4482
4483
4484
4485
4486

4487
4488
4489
4490
4491
4492
4493
4494
4495
4496
4497
4498
....
4500
4501
4502
4503
4504
4505
4506
4507
4508
4509
4510
4511
4512
4513
4514
4515
4516
4517
4518
4519
4520
4521

4522
4523
4524
4525
4526
4527
4528
4529
4530
4531
4532
4533
4534
4535
4536
4537
4538

4539
4540
4541
4542
4543
4544
4545
4546
4547
4548
4549
4550
4551

4552
4553
4554
4555
4556
4557
4558
4559
4560
4561
4562
4563
4564
4565
4566
4567
4568
4569
4570
4571

4572
4573
4574
4575
4576
4577
4578
4579
4580
4581
4582
....
4583
4584
4585
4586
4587
4588
4589

4590
4591
4592
4593
4594
4595
4596
4597
4598
4599
4600
4601
4602
4603
4604
4605
4606
4607
4608
4609
4610
4611
4612
4613
4614
4615
4616
4617
4618
4619
4620
4621
4622
4623
4624
4625
4626
4627
4628
4629
4630
4631
4632
4633
4634
4635
4636
4637
4638
4639
4640
4641
4642
4643
4644
4645
4646
4647
4648
4649
4650
4651
4652
4653
4654
4655
4656
4657
4658
4659
4660
4661
4662
4663
4664
4665
4666
4667
4668
4669
4670
4671
4672
4673
4674
4675
4676
4677
4678
4679
4680
4681
4682
4683
4684
4685
4686
4687
4688
4689
4690
4691
4692
4693
4694
4695
4696
4697
4698
4699
....
4703
4704
4705
4706
4707
4708
4709
4710
4711
4712
4713
4714
4715
4716
4717
....
4725
4726
4727
4728
4729
4730
4731

4732
4733
4734
4735
4736
4737
4738
4739
4740
4741
4742
4743
4744
....
4755
4756
4757
4758
4759
4760
4761
4762
4763
4764
4765
4766
4767
4768

4769
4770
4771
4772
4773
4774
4775
4776
4777
4778
4779
4780
4781
4782
4783
4784
4785
4786
4787
4788
4789
4790
4791
4792
4793
4794
4795
4796
4797
4798
4799
4800
4801
4802
4803
4804
4805
4806
4807
4808
4809
4810
4811
4812
4813
4814
4815
4816
4817
4818
....
4832
4833
4834
4835
4836
4837
4838
4839
4840
4841
4842
4843
4844
4845
4846
4847
4848
4849
4850
4851
4852
4853
4854
4855
4856
4857
4858
4859
4860
4861
4862
4863
4864
4865
4866
4867
4868
4869
4870
4871
4872
4873
4874
4875
4876
4877
4878
4879
4880
4881
4882
4883
4884
4885
4886
4887
4888
4889
4890
4891
4892
4893
4894
4895
4896
4897
4898
4899
4900
4901
....
4912
4913
4914
4915
4916
4917
4918

4919
4920
4921
4922
4923
4924
4925
4926
4927
4928
4929
4930
4931
4932
4933
4934
4935
4936
4937
4938
4939
4940
4941
4942
4943
4944
4945
4946
4947
4948
4949
4950
4951
4952
4953
4954
4955
4956
4957
4958
4959
4960
4961
4962
4963
....
4966
4967
4968
4969
4970
4971
4972
4973
4974
4975
4976
4977
4978
4979
4980
4981
4982
4983
4984
4985
4986
4987
4988
4989
4990
4991
4992
4993
4994
4995
4996
4997
4998
4999
5000
5001
5002
5003
5004
5005
5006
5007
5008
5009
5010
5011
5012
5013
5014
5015
5016
5017
5018
5019
5020
5021
5022
5023
5024
5025
5026
5027
5028
5029
5030
5031

5032
5033
5034
5035
5036
5037
5038
5039
5040
5041
5042
5043
....
5046
5047
5048
5049
5050
5051
5052
5053
5054
5055
5056
5057
5058
5059
5060
5061
....
5087
5088
5089
5090
5091
5092
5093
5094
5095
5096
5097
5098
5099
5100
5101
....
5104
5105
5106
5107
5108
5109
5110
5111
5112
5113
5114
5115
5116
5117
5118
5119
5120
....
5121
5122
5123
5124
5125
5126
5127

5128
5129
5130
5131
5132
5133
5134
5135
5136
5137
5138
5139
5140
5141
5142
5143
5144
5145
5146
5147
5148
5149
5150
5151
5152
5153
5154
5155
5156
5157
5158
5159
5160
5161
5162
5163
5164
5165
5166
5167

5168
5169
5170
5171
5172
5173
5174
5175
5176
5177
5178
5179
5180
5181
5182
5183
5184
5185
5186
5187
5188
5189
....
5199
5200
5201
5202
5203
5204
5205
5206
5207
5208
5209
5210
5211
5212
5213
5214
5215
5216
5217
5218
5219
5220
5221
5222
5223
5224
5225
5226
5227
5228
5229
5230
5231
5232
5233
5234

5235
5236
5237
5238
5239
5240
5241
5242
5243
5244
5245
5246
5247
5248
5249
5250
5251
5252
5253
5254
5255
5256
5257
5258
5259
5260
5261
5262
5263
5264
....
5265
5266
5267
5268
5269
5270
5271
5272
5273
5274
5275
5276
5277
5278
5279
5280
5281
5282
5283
5284
5285
5286
5287
5288
5289
5290
5291
5292
5293
5294
5295
5296
5297
5298
5299
5300
5301
5302
5303
5304
5305
5306
5307
5308
5309
5310
5311
5312
5313
5314
5315
5316
5317
5318
5319
5320
5321
5322
5323
5324
5325
5326
5327
5328
5329
5330
5331
5332
5333
5334
5335
5336
5337
5338
5339
5340
5341
5342
5343
5344
5345
5346
5347
5348
5349
5350
5351
5352
5353
5354
5355
5356
5357
5358
5359
5360
5361
5362
5363
5364
5365
5366
5367
5368
5369
5370
5371
5372
5373
5374
5375
5376
5377
5378
5379
5380
5381
5382
5383
5384
5385
5386
5387
5388
5389
5390
5391
5392
5393
5394
5395
5396
5397
5398
5399
5400
5401
5402
5403
5404
5405
5406
5407
5408
5409
5410
5411
5412
5413
5414
5415
5416
5417
5418
5419
5420
5421
5422
5423
5424
5425
5426
5427
5428
5429
5430
5431
5432
5433
5434
5435
5436
5437
5438
5439
5440
5441
5442
5443
5444
5445
5446
5447
5448
5449
5450
5451
5452
5453
5454
5455
5456
5457
5458
5459
5460
5461
5462
5463
5464
5465
5466
5467
5468
5469
5470
5471
5472
5473
5474
5475
5476
5477
5478
5479
5480
5481
5482
5483
5484
5485
5486
5487
5488
5489
5490
5491
5492
5493
5494
5495
5496
5497
5498
5499
5500
5501
5502
5503
5504
5505
5506
5507
5508
5509
5510
5511
5512
5513
5514
5515
5516
5517
5518
5519
5520
5521
5522
5523
5524
5525
5526
5527
5528
5529
5530
5531
5532
5533
5534
5535
5536
5537
5538
5539
5540
5541
5542
5543
5544


5545
5546
5547
5548
5549
5550
5551
5552
5553
5554
5555
5556
5557
5558
5559

5560
5561
5562
5563
5564
5565
5566
5567
5568
5569
5570
5571
5572
5573
5574
5575
5576
5577
5578
5579
5580
5581
5582
5583
5584
5585
5586
5587
5588
5589
5590
5591
5592
5593
5594
5595
5596
5597
5598
5599
5600
5601
5602
5603
5604
5605
5606
5607
5608
5609
5610
5611
5612
5613
5614
5615
5616
5617
5618
5619
5620
5621
5622
5623
5624
5625
5626
5627
5628
5629
5630
5631
5632
5633
5634
5635
5636
5637

5638
5639
5640
5641
5642
5643
5644
5645
5646
5647
5648
5649
5650
5651
5652
5653
5654
5655
5656
5657
5658
5659
5660
5661
5662
5663
5664
5665
5666

5667
5668
5669
5670
5671
5672
5673
5674
5675
5676
5677
5678
5679
5680
5681
5682
5683
5684
5685
5686
5687

5688
5689
5690
5691
5692
5693
5694
5695
5696
5697
5698
5699
5700
5701
5702
5703
5704
5705
5706
5707
5708
5709
5710
5711
5712
5713
5714
5715
5716
5717
5718
5719
5720
5721
5722
5723
5724
5725
5726
5727
5728
5729
5730
5731
5732
5733
5734
5735
5736
5737
5738
5739
5740
5741
5742
5743
5744
5745
5746
5747
5748
5749
5750
5751
5752
5753
5754
....
5799
5800
5801
5802
5803
5804
5805
5806
5807
5808
5809
5810
5811
5812
5813
5814
....
5909
5910
5911
5912
5913
5914
5915
5916
5917
5918
5919
5920
5921
5922
5923
5924
5925
5926
5927
5928
5929
5930
5931
5932
5933
5934
5935
5936
....
5942
5943
5944
5945
5946
5947
5948
5949
5950
5951
5952
5953
5954
5955
5956
5957
....
5959
5960
5961
5962
5963
5964
5965
5966
5967
5968
5969
5970
5971
5972
5973
5974
....
5990
5991
5992
5993
5994
5995
5996
5997
5998
5999
6000
6001
6002
6003
6004
6005
6006
6007
6008
6009
6010
6011
6012
6013
6014
6015
6016
6017
6018
6019
6020
6021
6022
6023
6024
6025
6026
6027
....
6046
6047
6048
6049
6050
6051
6052
6053
6054
6055
6056
6057
6058
6059
6060
6061
....
6151
6152
6153
6154
6155
6156
6157
6158
6159
6160
6161
6162
6163
6164
6165
6166
....
6180
6181
6182
6183
6184
6185
6186
6187
6188
6189
6190
6191
6192
6193
6194
6195
....
6361
6362
6363
6364
6365
6366
6367
6368
6369
6370
6371
6372
6373
6374
6375
6376
....
6398
6399
6400
6401
6402
6403
6404
6405
6406
6407
6408
6409
6410
6411
6412
6413
....
6424
6425
6426
6427
6428
6429
6430
6431
6432
6433
6434
6435
6436
6437
6438
6439
....
6464
6465
6466
6467
6468
6469
6470
6471
6472
6473
6474
6475
6476
6477
6478
6479
6480
6481
6482
6483
6484
6485
6486
....
6495
6496
6497
6498
6499
6500
6501
6502
6503
6504
6505
6506
6507
6508
6509
6510
....
6617
6618
6619
6620
6621
6622
6623
6624
6625
6626
6627
6628
6629
6630
6631
6632
6633
6634
6635
6636
6637
6638
6639
6640
6641
6642
6643
6644
6645
6646
6647
6648
6649
6650
6651
....
6665
6666
6667
6668
6669
6670
6671
6672
6673
6674
6675
6676
6677
6678
6679
6680
....
6693
6694
6695
6696
6697
6698
6699

6700
6701
6702
6703
6704
6705
6706
6707
6708
6709
6710
....
6947
6948
6949
6950
6951
6952
6953
6954
6955
6956
6957
6958
6959
6960
6961
6962
....
7001
7002
7003
7004
7005
7006
7007
7008
7009
7010
7011
7012
7013
7014
7015
7016
7017
7018
7019
7020
7021
7022
7023
7024
7025
....
7092
7093
7094
7095
7096
7097
7098
7099
7100
7101
7102
7103
7104
7105
7106
7107
....
7114
7115
7116
7117
7118
7119
7120
7121
7122
7123
7124
7125
7126
7127
7128
7129
....
7136
7137
7138
7139
7140
7141
7142
7143
7144
7145
7146
7147
7148
7149
7150
7151
7152
7153
7154
7155
7156
7157
7158
7159
7160
7161
7162
7163
7164
7165
7166
7167
7168
7169
7170
7171
7172
7173
7174
7175
7176
7177
7178
7179
7180
7181
7182
7183
7184
....
7446
7447
7448
7449
7450
7451
7452
7453
7454
7455
7456
7457
7458
7459
7460
7461
....
7462
7463
7464
7465
7466
7467
7468
7469
7470
7471
7472
7473
7474
7475
7476
7477
....
7478
7479
7480
7481
7482
7483
7484
7485
7486
7487
7488
7489
7490
7491
7492
7493
7494
7495
7496
7497
7498
7499
7500
7501
7502
7503
7504
7505
7506
7507
7508
7509
7510
7511
7512
7513
7514
7515
7516
....
7517
7518
7519
7520
7521
7522
7523
7524
7525
7526
7527
7528
7529
7530
7531
7532
7533
....
7540
7541
7542
7543
7544
7545
7546
7547
7548
7549
7550
7551
7552
7553
7554
7555
....
7571
7572
7573
7574
7575
7576
7577
7578
7579
7580
7581
7582
7583
7584
7585
7586
7587
7588
7589
7590
7591
7592
7593
7594
7595
7596
7597
7598
7599
7600
7601
7602
7603
7604
7605
7606
7607
....
7619
7620
7621
7622
7623
7624
7625
7626
7627
7628
7629
7630
7631
7632
7633
7634
7635
7636
7637
7638
7639
7640
7641
7642
7643
7644
7645
7646
7647
7648
7649
7650
7651
7652
7653
7654
7655
7656
7657
7658
7659
7660
7661
7662
7663
7664
....
7678
7679
7680
7681
7682
7683
7684
7685
7686
7687
7688
7689
7690
7691
7692
7693
7694
7695
7696
7697
7698
7699
7700
7701
7702
7703
....
8922
8923
8924
8925
8926
8927
8928
































 
  2.8cm2.5cm3cm3cm

























































































































































































































<div class="p"><!----></div>

<h3 align="center">Tomasz Grysztar </h3>

<h1 align="center">flat assembler 1.72<br /><span class="small">Programmer's Manual</span> </h1>

<h3 align="center"> </h3>


<div class="p"><!----></div>
 <a id="tth_chAp1"></a><h1>
Chapter 1 <br />Introduction</h1>
................................................................................

<h4>Movement:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table class="tabular">
<tr><td width="158">Left arrow       </td><td align="left">move one character left </td></tr>
<tr><td width="158">Right arrow      </td><td align="left">move one character right </td></tr>
<tr><td width="158">Up arrow         </td><td align="left">move one line up </td></tr>
<tr><td width="158">Down arrow       </td><td align="left">move one line down </td></tr>
<tr><td width="158">Ctrl+Left arrow  </td><td align="left">move one word left </td></tr>
<tr><td width="158">Ctrl+Right arrow </td><td align="left">move one word right </td></tr>
<tr><td width="158">Home             </td><td align="left">move to the beginning of line </td></tr>
................................................................................

<h4>Editing:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table class="tabular">
<tr><td width="158">Insert         </td><td align="left">switch insert/overwrite mode </td></tr>
<tr><td width="158">Alt+Insert     </td><td align="left">switch horizontal/vertical blocks </td></tr>
<tr><td width="158">Delete         </td><td align="left">delete current character </td></tr>
<tr><td width="158">Backspace      </td><td align="left">delete previous character </td></tr>
<tr><td width="158">Ctrl+Backspace </td><td align="left">delete previous word </td></tr>
<tr><td width="158">Alt+Backspace  </td><td align="left">undo previous operation (also Ctrl+Z) </td></tr>
<tr><td width="158">Alt+Shift+Backspace  </td><td align="left">redo previously undone operation (also Ctrl+Shift+Z) </td></tr>
<tr><td width="158">Ctrl+Y         </td><td align="left">delete current line </td></tr>
<tr><td width="158">F6             </td><td align="left">duplicate current line </td></tr>
<tr><td width="158"></td></tr></table>
</div>
<div class="p"><!----></div>

<h4>Block operations:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table class="tabular">
<tr><td width="158">Ctrl+Insert  </td><td align="left">copy block into clipboard (also Ctrl+C) </td></tr>
<tr><td width="158">Shift+Insert </td><td align="left">paste block from the clipboard (also Ctrl+V) </td></tr>
<tr><td width="158">Ctrl+Delete  </td><td align="left">delete block </td></tr>
<tr><td width="158">Shift+Delete </td><td align="left">cut block into clipboard (also Ctrl+X) </td></tr>
<tr><td width="158">Ctrl+A       </td><td align="left">select all text </td></tr>
<tr><td width="158"></td></tr></table>
</div>
................................................................................

<h4>Search:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table class="tabular">
<tr><td width="158">F5       </td><td align="left">go to specified position (also Ctrl+G) </td></tr>
<tr><td width="158">F7       </td><td align="left">find (also Ctrl+F) </td></tr>
<tr><td width="158">Shift+F7 </td><td align="left">find next (also F3) </td></tr>
<tr><td width="158">Ctrl+F7  </td><td align="left">replace (also Ctrl+H) </td></tr>
<tr><td width="158"></td></tr></table>
</div>
<div class="p"><!----></div>
................................................................................

<h4>Compile:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table class="tabular">
<tr><td width="158">F9       </td><td align="left">compile and run </td></tr>
<tr><td width="158">Ctrl+F9  </td><td align="left">compile only </td></tr>
<tr><td width="158">Shift+F9 </td><td align="left">assign current file as main file to compile </td></tr>
<tr><td width="158">Ctrl+F8  </td><td align="left">compile and build symbols information </td></tr>
<tr><td width="158"></td></tr></table>
</div>
<div class="p"><!----></div>
................................................................................

<h4>Other keys:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table class="tabular">
<tr><td width="158">F2              </td><td align="left">save current file </td></tr>
<tr><td width="158">Shift+F2        </td><td align="left">save file under a new name </td></tr>
<tr><td width="158">F4              </td><td align="left">load file </td></tr>
<tr><td width="158">Ctrl+N          </td><td align="left">create new file </td></tr>
<tr><td width="158">Ctrl+Tab        </td><td align="left">switch to next file </td></tr>
<tr><td width="158">Ctrl+Shift+Tab  </td><td align="left">switch to previous file </td></tr>
<tr><td width="158">Alt+[1-9]       </td><td align="left">switch to file of given number </td></tr>
................................................................................
<tr><td width="158">Esc             </td><td align="left">close current file </td></tr>
<tr><td width="158">Alt+X           </td><td align="left">close all files and exit </td></tr>
<tr><td width="158">Ctrl+F6         </td><td align="left">calculator </td></tr>
<tr><td width="158">Alt+Left arrow  </td><td align="left">scroll left </td></tr>
<tr><td width="158">Alt+Right arrow </td><td align="left">scroll right </td></tr>
<tr><td width="158">Alt+Up arrow    </td><td align="left">scroll up </td></tr>
<tr><td width="158">Alt+Down arrow  </td><td align="left">scroll down </td></tr>

<tr><td width="158"></td></tr></table>
</div>
<div class="p"><!----></div>

<h4>Specific keys:</h4>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">      
<table class="tabular">
<tr><td width="158">F1     </td><td align="left">search for keyword in selected help file </td></tr>
<tr><td width="158">Alt+F1 </td><td align="left">contents of selected help file </td></tr>
<tr><td width="158"></td></tr></table>
</div>
<div class="p"><!----></div>
     <a id="tth_sEc1.1.4"></a><h3>
1.1.4&nbsp;&nbsp;Editor options</h3>
................................................................................
editor the so-called
dead keys (keys that don't immediately generate the character, but wait for a next key
to decide what character to put - usually you enter the character of a dead key by
pressing a space key after it). It may be useful if key for entering some of the characters that
you need to enter often into assembly source is a dead key and you don't need this
functionality for writing programs.

<div class="p"><!----></div>
<em>Time scrolling</em> - with this option enabled it is possible to use mouse wheel
to scroll through the undo/redo space while either AltGr or Ctrl+Alt keys are pressed.

<div class="p"><!----></div>
     <a id="tth_sEc1.1.5"></a><h3>
1.1.5&nbsp;&nbsp;Executing compiler from command line</h3>
To perform compilation from the command line you need to execute
the <tt>fasm.exe</tt> executable, providing two parameters - first
should be name of source file, second should be name of
destination file. If no second parameter is given, the name for
................................................................................
As it is stated above, after the successful compilation, the
compiler displays the compilation summary. It includes the
information of how many passes was done, how much time it took,
and how many bytes were written into the destination file. The
following is an example of the compilation summary:

<pre>
flat&nbsp;assembler&nbsp;&nbsp;version&nbsp;1.72&nbsp;(16384&nbsp;kilobytes&nbsp;memory)
38&nbsp;passes,&nbsp;5.3&nbsp;seconds,&nbsp;77824&nbsp;bytes.

</pre>
In case of error during the compilation process, the program will
display an error message. For example, when compiler can't find
the input file, it will display the following message:

<pre>
flat&nbsp;assembler&nbsp;&nbsp;version&nbsp;1.72&nbsp;(16384&nbsp;kilobytes&nbsp;memory)
error:&nbsp;source&nbsp;file&nbsp;not&nbsp;found.

</pre>
If the error is connected with a specific part of source code, the
source line that caused the error will be also displayed. Also
placement of this line in the source is given to help you finding
this error, for example:

<pre>
flat&nbsp;assembler&nbsp;&nbsp;version&nbsp;1.72&nbsp;(16384&nbsp;kilobytes&nbsp;memory)
example.asm&nbsp;[3]:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mob&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ax,1
error:&nbsp;illegal&nbsp;instruction.

</pre>
It means that in the third line of the <tt>example.asm</tt> file
compiler has encountered an unrecognized instruction. When the
line that caused error contains a macroinstruction, also the line
in macroinstruction definition that generated the erroneous
instruction is displayed:

<pre>
flat&nbsp;assembler&nbsp;&nbsp;version&nbsp;1.72&nbsp;(16384&nbsp;kilobytes&nbsp;memory)
example.asm&nbsp;[6]:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;stoschar&nbsp;7
example.asm&nbsp;[3]&nbsp;stoschar&nbsp;[1]:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mob&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;al,char
error:&nbsp;illegal&nbsp;instruction.

</pre>
................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb1.1">
</a> 
<div style="text-align:center">
<table border="1" class="tabular">
<tr><td align="center">Operator </td><td align="center">Bits </td><td align="center">Bytes </td></tr><tr><td></td></tr>
<tr><td align="center"><tt>byte</tt> </td><td align="center">8 </td><td align="center">1 </td></tr>
<tr><td align="center"><tt>word</tt> </td><td align="center">16 </td><td align="center">2 </td></tr>
<tr><td align="center"><tt>dword</tt> </td><td align="center">32 </td><td align="center">4 </td></tr>
<tr><td align="center"><tt>fword</tt> </td><td align="center">48 </td><td align="center">6 </td></tr>
<tr><td align="center"><tt>pword</tt> </td><td align="center">48 </td><td align="center">6 </td></tr>
<tr><td align="center"><tt>qword</tt> </td><td align="center">64 </td><td align="center">8 </td></tr>
................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb1.2">
</a> 
<div style="text-align:center">
<table border="1" class="tabular">
<tr><td align="center">Type </td><td align="center">Bits </td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td></tr><tr><td></td></tr>
<tr><td align="center"></td><td align="center">8 </td><td align="center"><tt>al</tt> </td><td align="center"><tt>cl</tt> </td><td align="center"><tt>dl</tt> </td><td align="center"><tt>bl</tt> </td><td align="center"><tt>ah</tt> </td><td align="center"><tt>ch</tt> </td><td align="center"><tt>dh</tt> </td><td align="center"><tt>bh</tt> </td></tr>
<tr><td align="center">General </td><td align="center">16 </td><td align="center"><tt>ax</tt> </td><td align="center"><tt>cx</tt> </td><td align="center"><tt>dx</tt> </td><td align="center"><tt>bx</tt> </td><td align="center"><tt>sp</tt> </td><td align="center"><tt>bp</tt> </td><td align="center"><tt>si</tt> </td><td align="center"><tt>di</tt> </td></tr>
<tr><td align="center"></td><td align="center">32 </td><td align="center"><tt>eax</tt> </td><td align="center"><tt>ecx</tt> </td><td align="center"><tt>edx</tt> </td><td align="center"><tt>ebx</tt> </td><td align="center"><tt>esp</tt> </td><td align="center"><tt>ebp</tt> </td><td align="center"><tt>esi</tt> </td><td align="center"><tt>edi</tt> </td></tr>
<tr><td align="center">Segment </td><td align="center">16 </td><td align="center"><tt>es</tt> </td><td align="center"><tt>cs</tt> </td><td align="center"><tt>ss</tt> </td><td align="center"><tt>ds</tt> </td><td align="center"><tt>fs</tt> </td><td align="center"><tt>gs</tt> </td><td align="center"></td><td align="center"></td></tr>
<tr><td align="center">Control </td><td align="center">32 </td><td align="center"><tt>cr0</tt> </td><td align="center"></td><td align="center"><tt>cr2</tt> </td><td align="center"><tt>cr3</tt> </td><td align="center"><tt>cr4</tt> </td><td align="center"></td><td align="center"></td><td align="center"></td></tr>
<tr><td align="center">Debug </td><td align="center">32 </td><td align="center"><tt>dr0</tt> </td><td align="center"><tt>dr1</tt> </td><td align="center"><tt>dr2</tt> </td><td align="center"><tt>dr3</tt> </td><td align="center"></td><td align="center"></td><td align="center"><tt>dr6</tt> </td><td align="center"><tt>dr7</tt> </td></tr>
<tr><td align="center">FPU </td><td align="center">80 </td><td align="center"><tt>st0</tt> </td><td align="center"><tt>st1</tt> </td><td align="center"><tt>st2</tt> </td><td align="center"><tt>st3</tt> </td><td align="center"><tt>st4</tt> </td><td align="center"><tt>st5</tt> </td><td align="center"><tt>st6</tt> </td><td align="center"><tt>st7</tt> </td></tr>
<tr><td align="center">MMX </td><td align="center">64 </td><td align="center"><tt>mm0</tt> </td><td align="center"><tt>mm1</tt> </td><td align="center"><tt>mm2</tt> </td><td align="center"><tt>mm3</tt> </td><td align="center"><tt>mm4</tt> </td><td align="center"><tt>mm5</tt> </td><td align="center"><tt>mm6</tt> </td><td align="center"><tt>mm7</tt> </td></tr>
<tr><td align="center">SSE </td><td align="center">128 </td><td align="center"><tt>xmm0</tt> </td><td align="center"><tt>xmm1</tt> </td><td align="center"><tt>xmm2</tt> </td><td align="center"><tt>xmm3</tt> </td><td align="center"><tt>xmm4</tt> </td><td align="center"><tt>xmm5</tt> </td><td align="center"><tt>xmm6</tt> </td><td align="center"><tt>xmm7</tt> </td></tr>
<tr><td align="center">AVX </td><td align="center">256 </td><td align="center"><tt>ymm0</tt> </td><td align="center"><tt>ymm1</tt> </td><td align="center"><tt>ymm2</tt> </td><td align="center"><tt>ymm3</tt> </td><td align="center"><tt>ymm4</tt> </td><td align="center"><tt>ymm5</tt> </td><td align="center"><tt>ymm6</tt> </td><td align="center"><tt>ymm7</tt> </td></tr>
<tr><td align="center">AVX-512 </td><td align="center">512 </td><td align="center"><tt>zmm0</tt> </td><td align="center"><tt>zmm1</tt> </td><td align="center"><tt>zmm2</tt> </td><td align="center"><tt>zmm3</tt> </td><td align="center"><tt>zmm4</tt> </td><td align="center"><tt>zmm5</tt> </td><td align="center"><tt>zmm6</tt> </td><td align="center"><tt>zmm7</tt> </td></tr>
<tr><td align="center">Opmask </td><td align="center">64 </td><td align="center"><tt>k0</tt> </td><td align="center"><tt>k1</tt> </td><td align="center"><tt>k2</tt> </td><td align="center"><tt>k3</tt> </td><td align="center"><tt>k4</tt> </td><td align="center"><tt>k5</tt> </td><td align="center"><tt>k6</tt> </td><td align="center"><tt>k7</tt> </td></tr>
<tr><td align="center">Bounds </td><td align="center">128 </td><td align="center"><tt>bnd0</tt> </td><td align="center"><tt>bnd1</tt> </td><td align="center"><tt>bnd2</tt> </td><td align="center"><tt>bnd3</tt> </td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td></tr></table>
</div>

<div style="text-align:center">Table 1.2: Registers.</div>
<a id="tab:registers">
</a>

<div class="p"><!----></div>
................................................................................
addressing, segment register name followed with a colon should be put just
before the address value (inside the square brackets or after the <tt>ptr</tt>
operator).

<div class="p"><!----></div>
     <a id="tth_sEc1.2.2"></a><h3>
1.2.2&nbsp;&nbsp;Data definitions</h3>
<a class="a" id="DB"></a>
<a class="a" id="RB"></a>
<a class="a" id="DW"></a>
<a class="a" id="DU"></a>
<a class="a" id="RW"></a>
<a class="a" id="DP"></a>
<a class="a" id="RP"></a>
<a class="a" id="DF"></a>
<a class="a" id="RF"></a>


<a class="a" id="DD"></a>
<a class="a" id="RD"></a>
<a class="a" id="DQ"></a>
<a class="a" id="RQ"></a>
<a class="a" id="DT"></a>
<a class="a" id="RT"></a>


To define data or reserve a space for it, use one of the directives listed
in table . The data definition directive should be
followed by one or more of numerical expressions, separated with commas.
These expressions define the values for data cells of size depending on which
directive is used. For example <tt>db&nbsp;1,2,3</tt> will define the three bytes of
values 1, 2 and 3 respectively.

................................................................................
make multiple copies of given values. The count of duplicates should precede
this operator and the value to duplicate should follow - it can even be the
chain of values separated with commas, but such set of values needs to be
enclosed with parenthesis, like <tt>db&nbsp;5&nbsp;dup&nbsp;(1,2)</tt>, which defines five copies
of the given two byte sequence.

<div class="p"><!----></div>
<a class="a" id="FILE"></a>
The <tt>file</tt> is a special directive and its syntax is different. This
directive includes a chain of bytes from file and it should be followed by
the quoted file name, then optionally numerical expression specifying offset
in file preceded by the colon, then - also optionally - comma and numerical
expression specifying count of bytes to include (if no count is specified,
all data up to the end of file is included). For example <tt>file&nbsp;'data.bin'</tt> will
include the whole file as binary data and <tt>file&nbsp;'data.bin':10h,4</tt> will include
only four bytes starting at offset 10h.
................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb1.3">
</a> 
<div style="text-align:center">
<table border="1" class="tabular">
<tr><td align="center">Size </td><td align="center">Define </td><td align="center">Reserve </td></tr>
<tr><td align="center">(bytes) </td><td align="center">data </td><td align="center">data </td></tr><tr><td></td></tr>
<tr><td align="center">1 </td><td align="center"><tt>db</tt> </td><td align="center"><tt>rb</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>file</tt> </td><td align="center"></td></tr>
<tr><td align="center">2 </td><td align="center"><tt>dw</tt> </td><td align="center"><tt>rw</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>du</tt> </td><td align="center"></td></tr>
<tr><td align="center">4 </td><td align="center"><tt>dd</tt> </td><td align="center"><tt>rd</tt> </td></tr>
................................................................................
any place of source (even before it was defined). Constant can be redefined
many times, but in this case it is accessible only after it was defined, and
is always equal to the value from last definition before the place where it's
used. When a constant is defined only once in source, it is - like the label -
accessible from anywhere.

<div class="p"><!----></div>
<a class="a" id="_"></a>
The definition of constant consists of name of the constant followed by the
<tt>=</tt> character and numerical expression, which after calculation will
become the value of constant. This value is always calculated at the time the
constant is defined. For example you can define <tt>count</tt> constant by
using the directive <tt>count&nbsp;=&nbsp;17</tt>, and then use it in the assembly
instructions, like <tt>mov&nbsp;cx,count</tt> - which will become <tt>mov&nbsp;cx,17</tt>
during the compilation process.

................................................................................
compares the sizes of operands, which should be equal. You can force
assembling that instruction by using size override:
<tt>mov&nbsp;ax,word&nbsp;[char]</tt>, but remember that this instruction will read the
two bytes beginning at <tt>char</tt> address, while it was defined as a one
byte.

<div class="p"><!----></div>
<a class="a" id="LABEL"></a>
The last and the most flexible way to define labels is to use <tt>label</tt>
directive. This directive should be followed by the name of label, then
optionally size operator and then - also optionally <tt>at</tt> operator and
the numerical expression defining the address at which this label should be
defined. For example <tt>label&nbsp;wchar&nbsp;word&nbsp;at&nbsp;char</tt> will define a new label
for the 16-bit data at the address of <tt>char</tt>. Now the instruction
<tt>mov&nbsp;ax,[wchar]</tt> will be after compilation the same as
<tt>mov&nbsp;ax,word&nbsp;[char]</tt>. If no address is specified, <tt>label</tt> directive
................................................................................
constants or labels. But they can be more complex, by using the arithmetical
or logical operators for calculations at compile time. All these operators
with their priority values are listed in table .
The operations with higher priority value will be calculated first, you can
of course change this behavior by putting some parts of expression into
parenthesis. The <tt>+</tt>, <tt>-</tt>, <tt>*</tt> and <tt>/</tt> are standard
arithmetical operations, <tt>mod</tt> calculates the remainder from division.
The <tt>and</tt>, <tt>or</tt>, <tt>xor</tt>, <tt>shl</tt>, <tt>shr</tt>, <tt>bsf</tt>, <tt>bsr</tt> and <tt>not</tt>
perform the same bit-logical operations as assembly instructions of those names.
The <tt>rva</tt> and <tt>plt</tt> are special unary operators that perform
conversions between different kinds of addresses, they can be used only with
few of the output formats and their meaning may vary (see ).

<div class="p"><!----></div>
The arithmetical and bit-logical calculations are usually processed as if they
operated on infinite precision 2-adic numbers, and assembler signalizes an
overflow error if because of its limitations it is not table to perform the
required calculation, or if the result is too large number to fit in either
signed or unsigned range for the destination unit size.





<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb1.4">
</a> 
<div style="text-align:center">
<table border="1" class="tabular">
<tr><td align="center">Priority </td><td align="center">Operators </td></tr><tr><td></td></tr>
<tr><td align="center">0 </td><td align="center"><tt>+</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>-</tt> </td></tr>
<tr><td align="center">1 </td><td align="center"><tt>*</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>/</tt> </td></tr>
<tr><td align="center">2 </td><td align="center"><tt>mod</tt> </td></tr>
<tr><td align="center">3 </td><td align="center"><tt>and</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>or</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>xor</tt> </td></tr>
<tr><td align="center">4 </td><td align="center"><tt>shl</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>shr</tt> </td></tr>
<tr><td align="center">5 </td><td align="center"><tt>not</tt> </td></tr>
<tr><td align="center">6 </td><td align="center"><tt>bsf</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>bsr</tt> </td></tr>
<tr><td align="center">7 </td><td align="center"><tt>rva</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>plt</tt> </td></tr></table>
</div>

<div style="text-align:center">Table 1.4: Arithmetical and bit-logical operators by priority.</div>
<a id="tab:operators_priority">
</a>

<div class="p"><!----></div>
The numbers in the expression are by default treated as a decimal, binary
numbers should have the <tt>b</tt> letter attached at the end, octal number
should end with <tt>o</tt> letter, hexadecimal numbers should begin with <tt>0x</tt> characters
................................................................................
of the segment register is also a mnemonic of instruction prefix, altough it
is recommended to use segment overrides inside the square brackets instead of
these prefixes.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.1"></a><h3>
2.1.1&nbsp;&nbsp;Data movement instructions</h3>
<a class="a" id="mov"></a>

<tt>mov</tt> transfers a byte, word or double word from the source operand to
the destination operand. It can transfer data between general registers, from
the general register to memory, or from memory to general register, but it
cannot move from memory to memory. It can also transfer an immediate value to
general register or memory, segment register to general register or memory,
general register or memory to segment register, control or debug register to
general register and general register to control or debug register. The
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;mov&nbsp;ds,[bx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;memory&nbsp;to&nbsp;segment&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;mov&nbsp;eax,cr0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;control&nbsp;register&nbsp;to&nbsp;general&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;mov&nbsp;cr3,ebx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;general&nbsp;register&nbsp;to&nbsp;control&nbsp;register

</pre>

<div class="p"><!----></div>
<a class="a" id="xchg"></a>
<tt>xchg</tt> swaps the contents of two operands. It can swap two byte
operands, two word operands or two double word operands. Order of operands is
not important. The operands may be two general registers, or general register
with memory. For example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;xchg&nbsp;ax,bx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;swap&nbsp;two&nbsp;general&nbsp;registers
&nbsp;&nbsp;&nbsp;&nbsp;xchg&nbsp;al,[char]&nbsp;&nbsp;;&nbsp;swap&nbsp;register&nbsp;with&nbsp;memory

</pre>

<div class="p"><!----></div>
<a class="a" id="push"></a>
<a class="a" id="pushw"></a>
<a class="a" id="pushd"></a>
<tt>push</tt> decrements the stack frame pointer (<tt>esp</tt> register), then
transfers the operand to the top of stack indicated by <tt>esp</tt>. The
operand can be memory, general register, segment register or immediate value
of word or double word size. If operand is an immediate value and no size is
specified, it is by default treated as a word value if assembler is in
16-bit mode and as a double word value if assembler is in 32-bit mode.
<tt>pushw</tt> and <tt>pushd</tt> mnemonics are variants of this instruction that
store the values of word or double word size respectively. If more operands
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;push&nbsp;es&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;store&nbsp;segment&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;pushw&nbsp;[bx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;store&nbsp;memory
&nbsp;&nbsp;&nbsp;&nbsp;push&nbsp;1000h&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;store&nbsp;immediate&nbsp;value

</pre>

<div class="p"><!----></div>
<a class="a" id="pusha"></a>
<a class="a" id="pushaw"></a>
<a class="a" id="pushad"></a>
<tt>pusha</tt> saves the contents of the eight general register on the stack.
This instruction has no operands. There are two version of this instruction,
one 16-bit and one 32-bit, assembler automatically generates the right
version for current mode, but it can be overridden by using <tt>pushaw</tt>
or <tt>pushad</tt> mnemonic to always get the 16-bit or 32-bit version.
The 16-bit version of this instruction pushes general registers on the stack
in the following order: <tt>ax</tt>, <tt>cx</tt>, <tt>dx</tt>, <tt>bx</tt>, the
initial value of <tt>sp</tt> before <tt>ax</tt> was pushed, <tt>bp</tt>, <tt>si</tt>
and <tt>di</tt>. The 32-bit version pushes equivalent 32-bit general
registers in the same order.

<div class="p"><!----></div>
<a class="a" id="pop"></a>
<a class="a" id="popw"></a>
<a class="a" id="popd"></a>
<tt>pop</tt> transfers the word or double word at the current top of stack to
the destination operand, and then increments <tt>esp</tt> to point to the new
top of stack. The operand can be memory, general register or segment
register. <tt>popw</tt> and <tt>popd</tt> mnemonics are variants of this
instruction for restoring the values of word or double word size respectively.
If more operands separated with spaces follow in the same line, compiler will
assemble chain of the <tt>pop</tt> instructions with these operands.

................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;pop&nbsp;bx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;restore&nbsp;general&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;pop&nbsp;ds&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;restore&nbsp;segment&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;popw&nbsp;[si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;restore&nbsp;memory

</pre>

<div class="p"><!----></div>
<a class="a" id="popa"></a>
<a class="a" id="popaw"></a>
<a class="a" id="popad"></a>
<tt>popa</tt> restores the registers saved on the stack by <tt>pusha</tt>
instruction, except for the saved value of <tt>sp</tt> (or <tt>esp</tt>),
which is ignored. This instruction has no operands. To force assembling
16-bit or 32-bit version of this instruction use <tt>popaw</tt> or
<tt>popad</tt> mnemonic.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.2"></a><h3>
................................................................................
The type conversion instructions convert bytes into words, words into double
words, and double words into quad words. These conversions can be done using
the sign extension or zero extension. The sign extension fills the extra bits
of the larger item with the value of the sign bit of the smaller item,
the zero extension simply fills them with zeros.

<div class="p"><!----></div>
<a class="a" id="cwd"></a>
<a class="a" id="cdq"></a>
<tt>cwd</tt> and <tt>cdq</tt> double the size of value <tt>ax</tt> or <tt>eax</tt>
register respectively and store the extra bits into the <tt>dx</tt> or
<tt>edx</tt> register. The conversion is done using the sign extension.
These instructions have no operands.

<div class="p"><!----></div>
<a class="a" id="cbw"></a>
<a class="a" id="cwde"></a>
<tt>cbw</tt> extends the sign of the byte in <tt>al</tt> throughout <tt>ax</tt>,
and <tt>cwde</tt> extends the sign of the word in <tt>ax</tt> throughout
<tt>eax</tt>. These instructions also have no operands.

<div class="p"><!----></div>
<a class="a" id="movsx"></a>
<a class="a" id="movzx"></a>
<tt>movsx</tt> converts a byte to word or double word and a word to double word
using the sign extension. <tt>movzx</tt> does the same, but it uses the zero
extension. The source operand can be general register or memory, while the
destination operand must be a general register. For example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;movsx&nbsp;ax,al&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;byte&nbsp;register&nbsp;to&nbsp;word&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;movsx&nbsp;edx,dl&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;byte&nbsp;register&nbsp;to&nbsp;double&nbsp;word&nbsp;register
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;movsx&nbsp;eax,word&nbsp;[bx]&nbsp;;&nbsp;word&nbsp;memory&nbsp;to&nbsp;double&nbsp;word&nbsp;register

</pre>

<div class="p"><!----></div>
     <a id="tth_sEc2.1.3"></a><h3>
2.1.3&nbsp;&nbsp;Binary arithmetic instructions</h3>
<a class="a" id="add"></a>

<tt>add</tt> replaces the destination operand with the sum of the source and
destination operands and sets CF if overflow has occurred. The operands may
be bytes, words or double words. The destination operand can be general
register or memory, the source operand can be general register or immediate
value, it can also be memory if the destination operand is register.

<pre>
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;add&nbsp;[di],al&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;register&nbsp;to&nbsp;memory
&nbsp;&nbsp;&nbsp;&nbsp;add&nbsp;al,48&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;immediate&nbsp;value&nbsp;to&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;add&nbsp;[char],48&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;immediate&nbsp;value&nbsp;to&nbsp;memory

</pre>

<div class="p"><!----></div>
<a class="a" id="adc"></a>
<tt>adc</tt> sums the operands, adds one if CF is set, and replaces the
destination operand with the result. Rules for the operands are the same as
for the <tt>add</tt> instruction. An <tt>add</tt> followed by multiple <tt>adc</tt>
instructions can be used to add numbers longer than 32 bits.

<div class="p"><!----></div>
<a class="a" id="inc"></a>
<tt>inc</tt> adds one to the operand, it does not affect CF. The operand can be
a general register or memory, and the size of the operand can be byte, word or double word.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;inc&nbsp;ax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;increment&nbsp;register&nbsp;by&nbsp;one
&nbsp;&nbsp;&nbsp;&nbsp;inc&nbsp;byte&nbsp;[bx]&nbsp;&nbsp;&nbsp;;&nbsp;increment&nbsp;memory&nbsp;by&nbsp;one

</pre>

<div class="p"><!----></div>
<a class="a" id="sub"></a>
<tt>sub</tt> subtracts the source operand from the destination operand and
replaces the destination operand with the result. If a borrow is required,
the CF is set. Rules for the operands are the same as for the <tt>add</tt>
instruction.

<div class="p"><!----></div>
<a class="a" id="sbb"></a>
<tt>sbb</tt> subtracts the source operand from the destination operand,
subtracts one if CF is set, and stores the result to the destination operand.
Rules for the operands are the same as for the <tt>add</tt> instruction.
A <tt>sub</tt> followed by multiple <tt>sbb</tt> instructions may be used to
subtract numbers longer than 32 bits.

<div class="p"><!----></div>
<a class="a" id="dec"></a>
<tt>dec</tt> subtracts one from the operand, it does not affect CF. Rules for
the operand are the same as for the <tt>inc</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="cmp"></a>
<tt>cmp</tt> subtracts the source operand from the destination operand. It
updates the flags as the <tt>sub</tt> instruction, but does not alter the
source and destination operands. Rules for the operands are the same as for
the <tt>sub</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="neg"></a>
<tt>neg</tt> subtracts a signed integer operand from zero. The effect of this
instructon is to reverse the sign of the operand from positive to negative or
from negative to positive. Rules for the operand are the same as for the
<tt>inc</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="xadd"></a>
<tt>xadd</tt> exchanges the destination operand with the source operand,
then loads the sum of the two values into the destination operand.  The destination operand
may be a general register or memory, the source operand must be a general register.

<div class="p"><!----></div>
All the above binary arithmetic instructions update SF, ZF, PF and OF flags.
SF is always set to the same value as the result's sign bit, ZF is set
when all the bits of result are zero, PF is set when low order eight bits of
result contain an even number of set bits, OF is set if result is too large for a
positive number or too small for a negative number (excluding sign bit) to fit in
destination operand.

<div class="p"><!----></div>
<a class="a" id="mul"></a>
<tt>mul</tt> performs an unsigned multiplication of the operand and the
accumulator. If the operand is a byte, the processor multiplies it by the
contents of <tt>al</tt> and returns the 16-bit result to <tt>ah</tt> and
<tt>al</tt>. If the operand is a word, the processor multiplies it by the
contents of <tt>ax</tt> and returns the 32-bit result to <tt>dx</tt> and
<tt>ax</tt>. If the operand is a double word, the processor multiplies it by
the contents of <tt>eax</tt> and returns the 64-bit result in <tt>edx</tt> and
<tt>eax</tt>. <tt>mul</tt> sets CF and OF when the upper half of the result is
nonzero, otherwise they are cleared. Rules for the operand are the same as
for the <tt>inc</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="imul"></a>
<tt>imul</tt> performs a signed multiplication operation. This
instruction has three variations. First has one operand and
behaves in the same way as the <tt>mul</tt> instruction. Second has
two operands, in this case destination operand is multiplied by
the source operand and the result replaces the destination
operand. Destination operand must be a general register, it can be
word or double word, source operand can be general register,
memory or immediate value. Third form has three operands, the
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;imul&nbsp;bx,10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;register&nbsp;by&nbsp;immediate&nbsp;value
&nbsp;&nbsp;&nbsp;&nbsp;imul&nbsp;ax,bx,10&nbsp;&nbsp;&nbsp;;&nbsp;register&nbsp;by&nbsp;immediate&nbsp;value&nbsp;to&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;imul&nbsp;ax,[si],10&nbsp;;&nbsp;memory&nbsp;by&nbsp;immediate&nbsp;value&nbsp;to&nbsp;register

</pre>

<div class="p"><!----></div>
<a class="a" id="div"></a>
<tt>div</tt> performs an unsigned division of the accumulator by the operand.
The dividend (the accumulator) is twice the size of the divisor (the
operand), the quotient and remainder have the same size as the divisor.
If divisor is byte, the dividend is taken from <tt>ax</tt> register, the
quotient is stored in <tt>al</tt> and the remainder is stored in <tt>ah</tt>.
If divisor is word, the upper half of dividend is taken from <tt>dx</tt>,
the lower half of dividend is taken from <tt>ax</tt>, the quotient is stored
in <tt>ax</tt> and the remainder is stored in <tt>dx</tt>. If divisor is double
word, the upper half of dividend is taken from <tt>edx</tt>, the lower half of
dividend is taken from <tt>eax</tt>, the quotient is stored in <tt>eax</tt> and
the remainder is stored in <tt>edx</tt>. Rules for the operand are the same as
for the <tt>mul</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="idiv"></a>
<tt>idiv</tt> performs a signed division of the accumulator by the operand.
It uses the same registers as the <tt>div</tt> instruction, and the rules for
the operand are the same.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.4"></a><h3>
2.1.4&nbsp;&nbsp;Decimal arithmetic instructions</h3>
Decimal arithmetic is performed by combining the binary arithmetic
................................................................................
arithmetic instructions. The decimal arithmetic instructions are used to
adjust the results of a previous binary arithmetic operation to produce a
valid packed or unpacked decimal result, or to adjust the inputs to a
subsequent binary arithmetic operation so the operation will produce a valid
packed or unpacked decimal result.

<div class="p"><!----></div>
<a class="a" id="daa"></a>
<tt>daa</tt> adjusts the result of adding two valid packed decimal operands in
<tt>al</tt>. <tt>daa</tt> must always follow the addition of two pairs of packed
decimal numbers (one digit in each half-byte) to obtain a pair of valid
packed decimal digits as results. The carry flag is set if carry was needed.
This instruction has no operands.

<div class="p"><!----></div>
<a class="a" id="das"></a>
<tt>das</tt> adjusts the result of subtracting two valid packed decimal
operands in <tt>al</tt>. <tt>das</tt> must always follow the subtraction of one
pair of packed decimal numbers (one digit in each half-byte) from another
to obtain a pair of valid packed decimal digits as results. The carry flag is
set if a borrow was needed. This instruction has no operands.

<div class="p"><!----></div>
<a class="a" id="aaa"></a>
<tt>aaa</tt> changes the contents of register <tt>al</tt> to a valid unpacked
decimal number, and zeroes the top four bits. <tt>aaa</tt> must always follow
the addition of two unpacked decimal operands in <tt>al</tt>. The carry flag is
set and <tt>ah</tt> is incremented if a carry is necessary. This instruction
has no operands.

<div class="p"><!----></div>
<a class="a" id="aas"></a>
<tt>aas</tt> changes the contents of register <tt>al</tt> to a valid unpacked
decimal number, and zeroes the top four bits. <tt>aas</tt> must always follow
the subtraction of one unpacked decimal operand from another in <tt>al</tt>.
The carry flag is set and <tt>ah</tt> decremented if a borrow is necessary.
This instruction has no operands.

<div class="p"><!----></div>
<a class="a" id="aam"></a>
<tt>aam</tt> corrects the result of a multiplication of two valid unpacked
decimal numbers. <tt>aam</tt> must always follow the multiplication of two
decimal numbers to produce a valid decimal result. The high order digit is
left in <tt>ah</tt>, the low order digit in <tt>al</tt>. The generalized version
of this instruction allows adjustment of the contents of the <tt>ax</tt> to
create two unpacked digits of any number base. The standard version of this
instruction has no operands, the generalized version has one operand - an
immediate value specifying the number base for the created digits.

<div class="p"><!----></div>
<a class="a" id="aad"></a>
<tt>aad</tt> modifies the numerator in <tt>ah</tt> and <tt>ah</tt> to prepare for
the division of two valid unpacked decimal operands so that the quotient
produced by the division will be a valid unpacked decimal number. <tt>ah</tt>
should contain the high order digit and <tt>al</tt> the low order digit.
This instruction adjusts the value and places the result in <tt>al</tt>, while
<tt>ah</tt> will contain zero. The generalized version of this instruction
allows adjustment of two unpacked digits of any number base. Rules for the
operand are the same as for the <tt>aam</tt> instruction.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.5"></a><h3>
2.1.5&nbsp;&nbsp;Logical instructions</h3>
<a class="a" id="not"></a>

<tt>not</tt> inverts the bits in the specified operand to form a one's
complement of the operand. It has no effect on the flags. Rules for the
operand are the same as for the <tt>inc</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="and"></a>
<a class="a" id="or"></a>
<a class="a" id="xor"></a>
<tt>and</tt>, <tt>or</tt> and <tt>xor</tt> instructions perform the standard
logical operations. They update the SF, ZF and PF flags. Rules for the
operands are the same as for the <tt>add</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="bt"></a>
<a class="a" id="bts"></a>
<a class="a" id="btr"></a>
<a class="a" id="btc"></a>
<tt>bt</tt>, <tt>bts</tt>, <tt>btr</tt> and <tt>btc</tt> instructions operate on a
single bit which can be in memory or in a general register. The location of
the bit is specified as an offset from the low order end of the operand.
The value of the offset is the taken from the second operand, it either may
be an immediate byte or a general register. These instructions first assign
the value of the selected bit to CF. <tt>bt</tt> instruction does nothing more,
<tt>bts</tt> sets the selected bit to 1, <tt>btr</tt> resets the selected bit to
0, <tt>btc</tt> changes the bit to its complement. The first operand can be
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;bts&nbsp;word&nbsp;[bx],15&nbsp;;&nbsp;test&nbsp;and&nbsp;set&nbsp;bit&nbsp;in&nbsp;memory
&nbsp;&nbsp;&nbsp;&nbsp;btr&nbsp;ax,cx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;test&nbsp;and&nbsp;reset&nbsp;bit&nbsp;in&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;btc&nbsp;word&nbsp;[bx],cx&nbsp;;&nbsp;test&nbsp;and&nbsp;complement&nbsp;bit&nbsp;in&nbsp;memory

</pre>

<div class="p"><!----></div>
<a class="a" id="bsf"></a>
<a class="a" id="bsr"></a>
<tt>bsf</tt> and <tt>bsr</tt> instructions scan a word or double word for first
set bit and store the index of this bit into destination operand, which must
be general register. The bit string being scanned is specified by source
operand, it may be either general register or memory. The ZF flag is set if
the entire string is zero (no set bits are found); otherwise it is cleared.
If no set bit is found, the value of the destination register is undefined.
<tt>bsf</tt> from low order to high order (starting from bit index zero).
<tt>bsr</tt> scans from high order to low order (starting from bit index 15 of
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;bsf&nbsp;ax,bx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;scan&nbsp;register&nbsp;forward
&nbsp;&nbsp;&nbsp;&nbsp;bsr&nbsp;ax,[si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;scan&nbsp;memory&nbsp;reverse

</pre>

<div class="p"><!----></div>
<a class="a" id="shl"></a>
<tt>shl</tt> shifts the destination operand left by the number of bits
specified in the second operand. The destination operand can be byte, word,
or double word general register or memory. The second operand can be an
immediate value or the <tt>cl</tt> register. The processor shifts zeros in from
the right (low order) side of the operand as bits exit from the left side.
The last bit that exited is stored in CF. <tt>sal</tt> is a synonym for
<tt>shl</tt>.

................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;shl&nbsp;byte&nbsp;[bx],1&nbsp;&nbsp;;&nbsp;shift&nbsp;memory&nbsp;left&nbsp;by&nbsp;one&nbsp;bit
&nbsp;&nbsp;&nbsp;&nbsp;shl&nbsp;ax,cl&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;shift&nbsp;register&nbsp;left&nbsp;by&nbsp;count&nbsp;from&nbsp;cl
&nbsp;&nbsp;&nbsp;&nbsp;shl&nbsp;word&nbsp;[bx],cl&nbsp;;&nbsp;shift&nbsp;memory&nbsp;left&nbsp;by&nbsp;count&nbsp;from&nbsp;cl

</pre>

<div class="p"><!----></div>
<a class="a" id="shr"></a>
<a class="a" id="sar"></a>
<tt>shr</tt> and <tt>sar</tt> shift the destination operand right by the number
of bits specified in the second operand. Rules for operands are the same as
for the <tt>shl</tt> instruction. <tt>shr</tt> shifts zeros in from the left side
of the operand as bits exit from the right side. The last bit that exited is
stored in CF. <tt>sar</tt> preserves the sign of the operand by shifting in
zeros on the left side if the value is positive or by shifting in ones if the
value is negative.

<div class="p"><!----></div>
<a class="a" id="shld"></a>
<tt>shld</tt> shifts bits of the destination operand to the left by the number
of bits specified in third operand, while shifting high order bits from the
source operand into the destination operand on the right. The source operand
remains unmodified. The destination operand can be a word or double word
general register or memory, the source operand must be a general register,
third operand can be an immediate value or the <tt>cl</tt> register.

<pre>
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;shld&nbsp;[di],bx,1&nbsp;&nbsp;&nbsp;;&nbsp;shift&nbsp;memory&nbsp;left&nbsp;by&nbsp;one&nbsp;bit
&nbsp;&nbsp;&nbsp;&nbsp;shld&nbsp;ax,bx,cl&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;shift&nbsp;register&nbsp;left&nbsp;by&nbsp;count&nbsp;from&nbsp;cl
&nbsp;&nbsp;&nbsp;&nbsp;shld&nbsp;[di],bx,cl&nbsp;&nbsp;;&nbsp;shift&nbsp;memory&nbsp;left&nbsp;by&nbsp;count&nbsp;from&nbsp;cl

</pre>

<div class="p"><!----></div>
<a class="a" id="shrd"></a>
<tt>shrd</tt> shifts bits of the destination operand to the right, while
shifting low order bits from the source operand into the destination operand
on the left. The source operand remains unmodified. Rules for operands are
the same as for the <tt>shld</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="rol"></a>
<a class="a" id="rcl"></a>
<tt>rol</tt> and <tt>rcl</tt> rotate the byte, word or double word destination
operand left by the number of bits specified in the second operand. For each
rotation specified, the high order bit that exits from the left of the
operand returns at the right to become the new low order bit. <tt>rcl</tt>
additionally puts in CF each high order bit that exits from the left side
of the operand before it returns to the operand as the low order bit on the
next rotation cycle. Rules for operands are the same as for the <tt>shl</tt>
instruction.

<div class="p"><!----></div>
<a class="a" id="ror"></a>
<a class="a" id="rcr"></a>
<tt>ror</tt> and <tt>rcr</tt> rotate the byte, word or double word destination
operand right by the number of bits specified in the second operand. For each
rotation specified, the low order bit that exits from the right of the
operand returns at the left to become the new high order bit. <tt>rcr</tt>
additionally puts in CF each low order bit that exits from the right side of
the operand before it returns to the operand as the high order bit on the
next rotation cycle. Rules for operands are the same as for the <tt>shl</tt>
instruction.

<div class="p"><!----></div>
<a class="a" id="test"></a>
<tt>test</tt> performs the same action as the <tt>and</tt> instruction, but it
does not alter the destination operand, only updates flags. Rules for the
operands are the same as for the <tt>and</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="bswap"></a>
<tt>bswap</tt> reverses the byte order of a 32-bit general register:
bits 0 through 7 are swapped with bits 24 through 31, and bits 8 through 15
are swapped with bits 16 through 23. This instruction is provided for
converting little-endian values to big-endian format and vice versa.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;bswap&nbsp;edx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;swap&nbsp;bytes&nbsp;in&nbsp;register

................................................................................
</pre>

<div class="p"><!----></div>
     <a id="tth_sEc2.1.6"></a><h3>
2.1.6&nbsp;&nbsp;Control transfer instructions</h3>

<div class="p"><!----></div>
<a class="a" id="jmp"></a>
<tt>jmp</tt> unconditionally transfers control to the target location. The
destination address can be specified directly within the instruction or
indirectly through a register or memory, the acceptable size of this address
depends on whether the jump is near or far (it can be specified by preceding
the operand with <tt>near</tt> or <tt>far</tt> operator) and whether the instruction is
16-bit or 32-bit. Operand for near jump should be <tt>word</tt> size for 16-bit
instruction or the <tt>dword</tt> size for 32-bit instruction. Operand for far jump
should be <tt>dword</tt> size for 16-bit instruction or <tt>pword</tt> size for 32-bit
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;jmp&nbsp;0FFFFh:0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;direct&nbsp;far&nbsp;jump
&nbsp;&nbsp;&nbsp;&nbsp;jmp&nbsp;ax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;indirect&nbsp;near&nbsp;jump
&nbsp;&nbsp;&nbsp;&nbsp;jmp&nbsp;pword&nbsp;[ebx]&nbsp;&nbsp;;&nbsp;indirect&nbsp;far&nbsp;jump

</pre>

<div class="p"><!----></div>
<a class="a" id="call"></a>
<tt>call</tt> transfers control to the procedure, saving on the stack the
address of the instruction following the <tt>call</tt> for later use by a
<tt>ret</tt> (return) instruction. Rules for the operands are the same as for
the <tt>jmp</tt> instruction, but the <tt>call</tt> has no short variant of
direct instruction and thus it not optimized.

<div class="p"><!----></div>
<a class="a" id="ret"></a>
<a class="a" id="retn"></a>
<a class="a" id="retf"></a>
<a class="a" id="retw"></a>
<a class="a" id="retnw"></a>
<a class="a" id="retfw"></a>
<a class="a" id="retd"></a>
<a class="a" id="retnd"></a>
<a class="a" id="retfd"></a>
<tt>ret</tt>, <tt>retn</tt> and <tt>retf</tt> instructions terminate the execution
of a procedure and transfers control back to the program that originally
invoked the procedure using the address that was stored on the stack by the
<tt>call</tt> instruction. <tt>ret</tt> is the equivalent for <tt>retn</tt>, which
returns from the procedure that was executed using the near call, while
<tt>retf</tt> returns from the procedure that was executed using the far call.
These instructions default to the size of address appropriate for the current
code setting, but the size of address can be forced to 16-bit by using the
................................................................................
the <tt>retd</tt>, <tt>retnd</tt> and <tt>retfd</tt> mnemonics. All these
instructions may optionally specify an immediate operand, by adding this
constant to the stack pointer, they effectively remove any arguments that the
calling program pushed on the stack before the execution of the <tt>call</tt>
instruction.

<div class="p"><!----></div>
<a class="a" id="iret"></a>
<a class="a" id="iretw"></a>
<a class="a" id="iretd"></a>
<tt>iret</tt> returns control to an interrupted procedure. It differs from
<tt>ret</tt> in that it also pops the flags from the stack into the flags
register. The flags are stored on the stack by the interrupt mechanism. It
defaults to the size of return address appropriate for the current code
setting, but it can be forced to use 16-bit or 32-bit address by using the
<tt>iretw</tt> or <tt>iretd</tt> mnemonic.

<div class="p"><!----></div>
<a class="a" id="jo"></a>
<a class="a" id="jno"></a>
<a class="a" id="jc"></a>
<a class="a" id="jb"></a>
<a class="a" id="jnae"></a>
<a class="a" id="jnc"></a>
<a class="a" id="jae"></a>
<a class="a" id="jnb"></a>
<a class="a" id="je"></a>
<a class="a" id="jz"></a>
<a class="a" id="jne"></a>
<a class="a" id="jnz"></a>
<a class="a" id="jbe"></a>
<a class="a" id="jna"></a>
<a class="a" id="ja"></a>
<a class="a" id="jnbe"></a>
<a class="a" id="js"></a>
<a class="a" id="jns"></a>
<a class="a" id="jp"></a>
<a class="a" id="jpe"></a>
<a class="a" id="jnp"></a>
<a class="a" id="jpo"></a>
<a class="a" id="jl"></a>
<a class="a" id="jnge"></a>
<a class="a" id="jge"></a>
<a class="a" id="jnl"></a>
<a class="a" id="jle"></a>
<a class="a" id="jng"></a>
<a class="a" id="jg"></a>
<a class="a" id="jnle"></a>
The conditional transfer instructions are jumps that may or may not transfer
control, depending on the state of the CPU flags when the instruction
executes. The mnemonics for conditional jumps may be obtained by attaching
the condition mnemonic (see table ) to the <tt>j</tt>
mnemonic, for example <tt>jc</tt> instruction will transfer the control when
the CF flag is set. The conditional jumps can be short or near, and direct only, and
can be optimized (see <a href="#sec:jumps">1.2.5</a>), the operand should be an immediate
value specifying target address.
................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb2.1">
</a> 
<div style="text-align:center">
<table border="1" class="tabular">
<tr><td align="center">Mnemonic </td><td align="center">Condition tested </td><td align="center">Description </td></tr><tr><td></td></tr>
<tr><td align="center"><tt>o</tt> </td><td align="center">OF = 1 </td><td align="center">overflow</td></tr>
<tr><td align="center"><tt>no</tt> </td><td align="center">OF = 0 </td><td align="center">not overflow</td></tr>
<tr><td align="center"><tt>c</tt> </td><td align="center"></td><td align="center">carry</td></tr>
<tr><td align="center"><tt>b</tt> </td><td align="center">CF = 1 </td><td align="center">below</td></tr>
<tr><td align="center"><tt>nae</tt> </td><td align="center"></td><td align="center">not above nor equal</td></tr>
<tr><td align="center"><tt>nc</tt> </td><td align="center"></td><td align="center">not carry</td></tr>
................................................................................
</div>

<div style="text-align:center">Table 2.1: Conditions.</div>
<a id="tab:conditions">
</a>

<div class="p"><!----></div>
<a class="a" id="loop"></a>
<a class="a" id="loopw"></a>
<a class="a" id="loopd"></a>
<a class="a" id="loope"></a>
<a class="a" id="loopz"></a>

<a class="a" id="loopew"></a>
<a class="a" id="loopzw"></a>
<a class="a" id="looped"></a>
<a class="a" id="loopzd"></a>
<a class="a" id="loopne"></a>
<a class="a" id="loopnz"></a>
<a class="a" id="loopnew"></a>
<a class="a" id="loopnzw"></a>
<a class="a" id="loopned"></a>
<a class="a" id="loopnzd"></a>
The <tt>loop</tt> instructions are conditional jumps that use a value placed in
<tt>cx</tt> (or <tt>ecx</tt>) to specify the number of repetitions of a software
loop. All <tt>loop</tt> instructions automatically decrement <tt>cx</tt> (or
<tt>ecx</tt>) and terminate the loop (don't transfer the control) when
<tt>cx</tt> (or <tt>ecx</tt>) is zero. It uses <tt>cx</tt> or <tt>ecx</tt> whether
the current code setting is 16-bit or 32-bit, but it can be forced to use
<tt>cx</tt> with the <tt>loopw</tt> mnemonic or to use <tt>ecx</tt> with the
<tt>loopd</tt> mnemonic. <tt>loope</tt> and <tt>loopz</tt> are the synonyms for the
................................................................................
<tt>loopned</tt> and <tt>loopnzd</tt> force them to use <tt>ecx</tt> register.
Every <tt>loop</tt> instruction needs an operand being an immediate value
specifying target address, it can be only short jump (in the range of 128
bytes back and 127 bytes forward from the address of instruction following
the <tt>loop</tt> instruction).

<div class="p"><!----></div>
<a class="a" id="jcxz"></a>
<a class="a" id="jecxz"></a>
<tt>jcxz</tt> branches to the label specified in the instruction if it finds a
value of zero in <tt>cx</tt>, <tt>jecxz</tt> does the same, but checks the value
of <tt>ecx</tt> instead of <tt>cx</tt>. Rules for the operands are the same as
for the <tt>loop</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="int"></a>
<a class="a" id="int3"></a>
<a class="a" id="into"></a>
<tt>int</tt> activates the interrupt service routine that corresponds to the
number specified as an operand to the instruction, the number should be in
range from 0 to 255. The interrupt service routine terminates with an
<tt>iret</tt> instruction that returns control to the instruction that follows
<tt>int</tt>. <tt>int3</tt> mnemonic codes the short (one byte) trap that invokes
the interrupt 3. <tt>into</tt> instruction invokes the interrupt 4 if the OF
flag is set.

<div class="p"><!----></div>
<a class="a" id="bound"></a>
<tt>bound</tt> verifies that the signed value contained in the specified
register lies within specified limits. An interrupt 5 occurs if the value
contained in the register is less than the lower bound or greater than the
upper bound. It needs two operands, the first operand specifies the register
being tested, the second operand should be memory address for the two signed
limit values. The operands can be <tt>word</tt> or <tt>dword</tt> in size.

<pre>
................................................................................
</pre>

<div class="p"><!----></div>
     <a id="tth_sEc2.1.7"></a><h3>
2.1.7&nbsp;&nbsp;I/O instructions</h3>

<div class="p"><!----></div>
<a class="a" id="in"></a>
<tt>in</tt> transfers a byte, word, or double word from an input port to
<tt>al</tt>, <tt>ax</tt>, or <tt>eax</tt>. I/O ports can be addressed either
directly, with the immediate byte value coded in instruction, or indirectly
via the <tt>dx</tt> register. The destination operand should be <tt>al</tt>,
<tt>ax</tt>, or <tt>eax</tt> register. The source operand should be an immediate
value in range from 0 to 255, or <tt>dx</tt> register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;in&nbsp;al,20h&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;input&nbsp;byte&nbsp;from&nbsp;port&nbsp;20h
&nbsp;&nbsp;&nbsp;&nbsp;in&nbsp;ax,dx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;input&nbsp;word&nbsp;from&nbsp;port&nbsp;addressed&nbsp;by&nbsp;dx

</pre>

<div class="p"><!----></div>
<a class="a" id="out"></a>
<tt>out</tt> transfers a byte, word, or double word to an output port from
<tt>al</tt>, <tt>ax</tt>, or <tt>eax</tt>. The program can specify the number of
the port using the same methods as the <tt>in</tt> instruction. The destination
operand should be an immediate value in range from 0 to 255, or <tt>dx</tt>
register. The source operand should be <tt>al</tt>, <tt>ax</tt>, or <tt>eax</tt>
register.

<pre>
................................................................................
of string element, it should be <tt>b</tt> for byte element, <tt>w</tt> for word
element, and <tt>d</tt> for double word element. Full form of string operation
needs operands providing the size operator and the memory addresses, which
can be <tt>si</tt> or <tt>esi</tt> with any segment prefix, <tt>di</tt> or
<tt>edi</tt> always with <tt>es</tt> segment prefix.

<div class="p"><!----></div>
<a class="a" id="movs"></a>
<a class="a" id="movsb"></a>
<a class="a" id="movsw"></a>
<a class="a" id="movsd"></a>
<tt>movs</tt> transfers the string element pointed to by <tt>si</tt> (or
<tt>esi</tt>) to the location pointed to by <tt>di</tt> (or <tt>edi</tt>). Size of
operands can be <tt>byte</tt>, <tt>word</tt> or <tt>dword</tt>. The destination
operand should be memory addressed by <tt>di</tt> or <tt>edi</tt>, the source
operand should be memory addressed by <tt>si</tt> or <tt>esi</tt> with any
segment prefix.

<pre>
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;movs&nbsp;byte&nbsp;[di],[si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;transfer&nbsp;byte
&nbsp;&nbsp;&nbsp;&nbsp;movs&nbsp;word&nbsp;[es:di],[ss:si]&nbsp;&nbsp;;&nbsp;transfer&nbsp;word
&nbsp;&nbsp;&nbsp;&nbsp;movsd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;transfer&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a class="a" id="cmps"></a>
<a class="a" id="cmpsb"></a>
<a class="a" id="cmpsw"></a>
<a class="a" id="cmpsd"></a>
<tt>cmps</tt> subtracts the destination string element from the source string
element and updates the flags AF, SF, PF, CF and OF, but it does not change
any of the compared elements. If the string elements are equal, ZF is set,
otherwise it is cleared. The first operand for this instruction should be the
source string element addressed by <tt>si</tt> or <tt>esi</tt> with any segment
prefix, the second operand should be the destination string element addressed
by <tt>di</tt> or <tt>edi</tt>.

................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;cmpsb&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;compare&nbsp;bytes
&nbsp;&nbsp;&nbsp;&nbsp;cmps&nbsp;word&nbsp;[ds:si],[es:di]&nbsp;&nbsp;;&nbsp;compare&nbsp;words
&nbsp;&nbsp;&nbsp;&nbsp;cmps&nbsp;dword&nbsp;[fs:esi],[edi]&nbsp;&nbsp;;&nbsp;compare&nbsp;double&nbsp;words

</pre>

<div class="p"><!----></div>
<a class="a" id="scas"></a>
<a class="a" id="scasb"></a>
<a class="a" id="scasw"></a>
<a class="a" id="scasd"></a>
<tt>scas</tt> subtracts the destination string element from <tt>al</tt>,
<tt>ax</tt>, or <tt>eax</tt> (depending on the size of string element) and
updates the flags AF, SF, ZF, PF, CF and OF. If the values are equal, ZF is
set, otherwise it is cleared. The operand should be the destination string
element addressed by <tt>di</tt> or <tt>edi</tt>.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;scas&nbsp;byte&nbsp;[es:di]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;scan&nbsp;byte
&nbsp;&nbsp;&nbsp;&nbsp;scasw&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;scan&nbsp;word
&nbsp;&nbsp;&nbsp;&nbsp;scas&nbsp;dword&nbsp;[es:edi]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;scan&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a class="a" id="lods"></a>
<a class="a" id="lodsb"></a>
<a class="a" id="lodsw"></a>
<a class="a" id="lodsd"></a>
<tt>lods</tt> places the source string element into <tt>al</tt>, <tt>ax</tt>, or
<tt>eax</tt>. The operand should be the source string element addressed by
<tt>si</tt> or <tt>esi</tt> with any segment prefix.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;lods&nbsp;byte&nbsp;[ds:si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;byte
&nbsp;&nbsp;&nbsp;&nbsp;lods&nbsp;word&nbsp;[cs:si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;word
&nbsp;&nbsp;&nbsp;&nbsp;lodsd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a class="a" id="stos"></a>
<a class="a" id="stosb"></a>
<a class="a" id="stosw"></a>
<a class="a" id="stosd"></a>
<tt>stos</tt> places the value of <tt>al</tt>, <tt>ax</tt>, or <tt>eax</tt> into the
destination string element. Rules for the operand are the same as for the
<tt>scas</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="ins"></a>
<a class="a" id="insb"></a>
<a class="a" id="insw"></a>
<a class="a" id="insd"></a>
<tt>ins</tt> transfers a byte, word, or double word from an input port
addressed by <tt>dx</tt> register to the destination string element. The
destination operand should be memory addressed by <tt>di</tt> or <tt>edi</tt>,
the source operand should be the <tt>dx</tt> register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;insb&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;input&nbsp;byte
&nbsp;&nbsp;&nbsp;&nbsp;ins&nbsp;word&nbsp;[es:di],dx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;input&nbsp;word
&nbsp;&nbsp;&nbsp;&nbsp;ins&nbsp;dword&nbsp;[edi],dx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;input&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a class="a" id="outs"></a>
<a class="a" id="outsb"></a>
<a class="a" id="outsw"></a>
<a class="a" id="outsd"></a>
<tt>outs</tt> transfers the source string element to an output port addressed
by <tt>dx</tt> register. The destination operand should be the <tt>dx</tt>
register and the source operand should be memory addressed by <tt>si</tt> or
<tt>esi</tt> with any segment prefix.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;outs&nbsp;dx,byte&nbsp;[si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;output&nbsp;byte
&nbsp;&nbsp;&nbsp;&nbsp;outsw&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;output&nbsp;word
&nbsp;&nbsp;&nbsp;&nbsp;outs&nbsp;dx,dword&nbsp;[gs:esi]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;output&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a class="a" id="rep"></a>
<a class="a" id="repe"></a>
<a class="a" id="repz"></a>
<a class="a" id="repne"></a>
<a class="a" id="repnz"></a>
The repeat prefixes <tt>rep</tt>, <tt>repe</tt>/<tt>repz</tt>, and
<tt>repne</tt>/<tt>repnz</tt> specify repeated string operation. When a string
operation instruction has a repeat prefix, the operation is executed
repeatedly, each time using a different element of the string. The repetition
terminates when one of the conditions specified by the prefix is satisfied.
All three prefixes automatically decrease <tt>cx</tt> or <tt>ecx</tt> register
(depending whether string operation instruction uses the 16-bit or 32-bit
addressing) after each operation and repeat the associated operation until
................................................................................

<div class="p"><!----></div>
The flag control instructions provide a method for directly changing the
state of bits in the flag register. All instructions described in this
section have no operands.

<div class="p"><!----></div>
<a class="a" id="stc"></a>
<a class="a" id="clc"></a>
<a class="a" id="cmc"></a>
<a class="a" id="std"></a>
<a class="a" id="cld"></a>
<a class="a" id="sti"></a>
<a class="a" id="cli"></a>
<tt>stc</tt> sets the CF (carry flag) to 1, <tt>clc</tt> zeroes the CF,
<tt>cmc</tt> changes the CF to its complement. <tt>std</tt> sets the DF
(direction flag) to 1, <tt>cld</tt> zeroes the DF, <tt>sti</tt> sets the IF
(interrupt flag) to 1 and therefore enables the interrupts, <tt>cli</tt> zeroes
the IF and therefore disables the interrupts.

<div class="p"><!----></div>
<a class="a" id="lahf"></a>
<tt>lahf</tt> copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the
<tt>ah</tt> register. The contents of the remaining bits are undefined.
The flags remain unaffected.

<div class="p"><!----></div>
<a class="a" id="sahf"></a>
<tt>sahf</tt> transfers bits 7, 6, 4, 2, and 0 from the <tt>ah</tt> register
into SF, ZF, AF, PF, and CF.

<div class="p"><!----></div>
<a class="a" id="pushf"></a>
<a class="a" id="pushfw"></a>
<a class="a" id="pushfd"></a>
<tt>pushf</tt> decrements <tt>esp</tt> by two or four and stores the low word or
double word of flags register at the top of stack, size of stored data
depends on the current code setting. <tt>pushfw</tt> variant forces storing the
word and <tt>pushfd</tt> forces storing the double word.

<div class="p"><!----></div>
<a class="a" id="popf"></a>
<a class="a" id="popfw"></a>
<a class="a" id="popfd"></a>
<tt>popf</tt> transfers specific bits from the word or double word at the top
of stack, then increments <tt>esp</tt> by two or four, this value depends on
the current code setting. <tt>popfw</tt> variant forces restoring from the word
and <tt>popfd</tt> forces restoring from the double word.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.10"></a><h3>
2.1.10&nbsp;&nbsp;Conditional operations</h3>

<div class="p"><!----></div>
<a class="a" id="seto"></a>
<a class="a" id="setno"></a>
<a class="a" id="setc"></a>
<a class="a" id="setb"></a>
<a class="a" id="setnae"></a>
<a class="a" id="setnc"></a>
<a class="a" id="setae"></a>
<a class="a" id="setnb"></a>
<a class="a" id="sete"></a>
<a class="a" id="setz"></a>
<a class="a" id="setne"></a>
<a class="a" id="setnz"></a>
<a class="a" id="setbe"></a>
<a class="a" id="setna"></a>
<a class="a" id="seta"></a>
<a class="a" id="setnbe"></a>
<a class="a" id="sets"></a>
<a class="a" id="setns"></a>
<a class="a" id="setp"></a>
<a class="a" id="setpe"></a>
<a class="a" id="setnp"></a>
<a class="a" id="setpo"></a>
<a class="a" id="setl"></a>
<a class="a" id="setnge"></a>
<a class="a" id="setge"></a>
<a class="a" id="setnl"></a>
<a class="a" id="setle"></a>
<a class="a" id="setng"></a>
<a class="a" id="setg"></a>
<a class="a" id="setnle"></a>
The instructions obtained by attaching the condition mnemonic (see table
<a href="#tab:conditions">2.1</a>) to the <tt>set</tt> mnemonic set a byte to one if the
condition is true and set the byte to zero otherwise. The operand should be
an 8-bit be general register or the byte in memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;setne&nbsp;al&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;set&nbsp;al&nbsp;if&nbsp;zero&nbsp;flag&nbsp;cleared
&nbsp;&nbsp;&nbsp;&nbsp;seto&nbsp;byte&nbsp;[bx]&nbsp;&nbsp;&nbsp;;&nbsp;set&nbsp;byte&nbsp;if&nbsp;overflow

</pre>

<div class="p"><!----></div>
<a class="a" id="salc"></a>
<tt>salc</tt> instruction sets the all bits of <tt>al</tt> register when the
carry flag is set and zeroes the <tt>al</tt> register otherwise. This
instruction has no arguments.

<div class="p"><!----></div>
<a class="a" id="cmovo"></a>
<a class="a" id="cmovno"></a>
<a class="a" id="cmovc"></a>
<a class="a" id="cmovb"></a>
<a class="a" id="cmovnae"></a>
<a class="a" id="cmovnc"></a>
<a class="a" id="cmovae"></a>
<a class="a" id="cmovnb"></a>
<a class="a" id="cmove"></a>
<a class="a" id="cmovz"></a>
<a class="a" id="cmovne"></a>
<a class="a" id="cmovnz"></a>
<a class="a" id="cmovbe"></a>
<a class="a" id="cmovna"></a>
<a class="a" id="cmova"></a>
<a class="a" id="cmovnbe"></a>
<a class="a" id="cmovs"></a>
<a class="a" id="cmovns"></a>
<a class="a" id="cmovp"></a>
<a class="a" id="cmovpe"></a>
<a class="a" id="cmovnp"></a>
<a class="a" id="cmovpo"></a>
<a class="a" id="cmovl"></a>

<a class="a" id="cmovnge"></a>
<a class="a" id="cmovge"></a>
<a class="a" id="cmovnl"></a>
<a class="a" id="cmovle"></a>
<a class="a" id="cmovng"></a>
<a class="a" id="cmovg"></a>
<a class="a" id="cmovnle"></a>
The instructions obtained by attaching the condition mnemonic to
<tt>cmov</tt> mnemonic transfer the word or double word from the general
register or memory to the general register only when the condition is true.
The destination operand should be general register, the source operand can be
general register or memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;cmove&nbsp;ax,bx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;move&nbsp;when&nbsp;zero&nbsp;flag&nbsp;set
&nbsp;&nbsp;&nbsp;&nbsp;cmovnc&nbsp;eax,[ebx]&nbsp;;&nbsp;move&nbsp;when&nbsp;carry&nbsp;flag&nbsp;cleared

</pre>

<div class="p"><!----></div>
<a class="a" id="cmpxchg"></a>
<tt>cmpxchg</tt> compares the value in the <tt>al</tt>, <tt>ax</tt>, or <tt>eax</tt>
register with the destination operand. If the two values are equal,
the source operand is loaded into the destination operand. Otherwise,
the destination operand is loaded into the <tt>al</tt>, <tt>ax</tt>, or <tt>eax</tt>
register. The destination operand may be a general register or memory, the
source operand must be a general register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;cmpxchg&nbsp;dl,bl&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;compare&nbsp;and&nbsp;exchange&nbsp;with&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;cmpxchg&nbsp;[bx],dx&nbsp;&nbsp;;&nbsp;compare&nbsp;and&nbsp;exchange&nbsp;with&nbsp;memory

</pre>

<div class="p"><!----></div>
<a class="a" id="cmpxchg8b"></a>
<tt>cmpxchg8b</tt> compares the 64-bit value in <tt>edx</tt> and <tt>eax</tt>
registers with the destination operand. If the values are equal, the 64-bit
value in <tt>ecx</tt> and <tt>ebx</tt> registers is stored in the destination
operand. Otherwise, the value in the destination operand is loaded into
<tt>edx</tt> and <tt>eax</tt> registers. The destination operand should be a
quad word in memory.

<pre>
................................................................................
</pre>

<div class="p"><!----></div>
     <a id="tth_sEc2.1.11"></a><h3>
2.1.11&nbsp;&nbsp;Miscellaneous instructions</h3>

<div class="p"><!----></div>
<a class="a" id="nop"></a>
<tt>nop</tt> instruction occupies one byte but affects nothing but the
instruction pointer. This instruction has no operands and doesn't perform any
operation.

<div class="p"><!----></div>
<a class="a" id="ud2"></a>
<tt>ud2</tt> instruction generates an invalid opcode exception. This
instruction is provided for software testing to explicitly generate an
invalid opcode. This is instruction has no operands.

<div class="p"><!----></div>
<a class="a" id="xlat"></a>
<tt>xlat</tt> replaces a byte in the <tt>al</tt> register with a byte indexed by
its value in a translation table addressed by <tt>bx</tt> or <tt>ebx</tt>. The
operand should be a byte memory addressed by <tt>bx</tt> or <tt>ebx</tt> with any
segment prefix. This instruction has also a short form <tt>xlatb</tt> which has
no operands and uses the <tt>bx</tt> or <tt>ebx</tt> address in the segment
selected by <tt>ds</tt> depending on the current code setting.

<div class="p"><!----></div>
<a class="a" id="lds"></a>
<a class="a" id="les"></a>
<a class="a" id="lfs"></a>
<a class="a" id="lgs"></a>
<a class="a" id="lss"></a>
<tt>lds</tt> transfers a pointer variable from the source operand to <tt>ds</tt>
and the destination register. The source operand must be a memory operand,
and the destination operand must be a general register. The <tt>ds</tt>
register receives the segment selector of the pointer while the destination
register receives the offset part of the pointer. <tt>les</tt>, <tt>lfs</tt>,
<tt>lgs</tt> and <tt>lss</tt> operate identically to <tt>lds</tt> except that
rather than <tt>ds</tt> register the <tt>es</tt>, <tt>fs</tt>, <tt>gs</tt> and
<tt>ss</tt> is used respectively.
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;lds&nbsp;bx,[si]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;pointer&nbsp;to&nbsp;ds:bx

</pre>

<div class="p"><!----></div>
<a class="a" id="lea"></a>
<tt>lea</tt> transfers the offset of the source operand (rather than its value)
to the destination operand. The source operand must be a memory operand, and
the destination operand must be a general register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;lea&nbsp;dx,[bx+si+1]&nbsp;;&nbsp;load&nbsp;effective&nbsp;address&nbsp;to&nbsp;dx

</pre>

<div class="p"><!----></div>
<a class="a" id="cpuid"></a>
<tt>cpuid</tt> returns processor identification and feature information in the
<tt>eax</tt>, <tt>ebx</tt>, <tt>ecx</tt>, and <tt>edx</tt> registers. The information
returned is selected by entering a value in the <tt>eax</tt> register before
the instruction is executed. This instruction has no operands.

<div class="p"><!----></div>
<a class="a" id="pause"></a>
<tt>pause</tt> instruction delays the execution of the next instruction an
implementation specific amount of time. It can be used to improve the
performance of spin wait loops. This instruction has no operands.

<div class="p"><!----></div>
<a class="a" id="enter"></a>
<a class="a" id="leave"></a>
<tt>enter</tt> creates a stack frame that may be used to implement the scope
rules of block-structured high-level languages. A <tt>leave</tt> instruction
at the end of a procedure complements an <tt>enter</tt> at the beginning of the
procedure to simplify stack management and to control access to variables for
nested procedures. The <tt>enter</tt> instruction includes two parameters. The
first parameter specifies the number of bytes of dynamic storage to be
allocated on the stack for the routine being entered. The second parameter
corresponds to the lexical nesting level of the routine, it can be in range
................................................................................
</pre>

<div class="p"><!----></div>
     <a id="tth_sEc2.1.12"></a><h3>
2.1.12&nbsp;&nbsp;System instructions</h3>

<div class="p"><!----></div>
<a class="a" id="lmsw"></a>
<a class="a" id="smsw"></a>
<tt>lmsw</tt> loads the operand into the machine status word (bits 0 through 15
of <tt>cr0</tt> register), while <tt>smsw</tt> stores the machine status word
into the destination operand. The operand for both those instructions can be 16-bit
general register or memory, for <tt>smsw</tt> it can also be 32-bit general
register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;lmsw&nbsp;ax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;machine&nbsp;status&nbsp;from&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;smsw&nbsp;[bx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;store&nbsp;machine&nbsp;status&nbsp;to&nbsp;memory

</pre>

<div class="p"><!----></div>
<a class="a" id="lgdt"></a>
<a class="a" id="lidt"></a>
<a class="a" id="sgdt"></a>
<a class="a" id="sidt"></a>
<tt>lgdt</tt> and <tt>lidt</tt> instructions load the values in operand into the
global descriptor table register or the interrupt descriptor table register
respectively. <tt>sgdt</tt> and <tt>sidt</tt> store the contents of the global
descriptor table register or the interrupt descriptor table register
in the destination operand. The operand should be a 6 bytes in memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;lgdt&nbsp;[ebx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;global&nbsp;descriptor&nbsp;table

</pre>

<div class="p"><!----></div>
<a class="a" id="lldt"></a>
<a class="a" id="sldt"></a>
<a class="a" id="ltr"></a>
<a class="a" id="str"></a>
<tt>lldt</tt> loads the operand into the segment selector field of
the local descriptor table register and <tt>sldt</tt> stores the
segment selector from the local descriptor table register in the
operand. <tt>ltr</tt> loads the operand into the segment selector
field of the task register and <tt>str</tt> stores the segment
selector from the task register in the operand. Rules for operand
are the same as for the <tt>lmsw</tt> and <tt>smsw</tt> instructions.

<div class="p"><!----></div>
<a class="a" id="lar"></a>
<tt>lar</tt> loads the access rights from the segment descriptor
specified by the selector in source operand into the destination
operand and sets the ZF flag. The destination operand can be a
16-bit or 32-bit general register. The source operand should be a
16-bit general register or memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;lar&nbsp;ax,[bx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;access&nbsp;rights&nbsp;into&nbsp;word
&nbsp;&nbsp;&nbsp;&nbsp;lar&nbsp;eax,dx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;access&nbsp;rights&nbsp;into&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>
<a class="a" id="lsl"></a>
<tt>lsl</tt> loads the segment limit from the segment descriptor specified by
the selector in source operand into the destination operand and sets the ZF
flag. Rules for operand are the same as for the <tt>lar</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="verr"></a>
<a class="a" id="verw"></a>
<tt>verr</tt> and <tt>verw</tt> verify whether the code or data segment specified
with the operand is readable or writable from the current privilege level.
The operand should be a word, it can be general register or memory.
If the segment is accessible and readable (for <tt>verr</tt>) or writable (for
<tt>verw</tt>) the ZF flag is set, otherwise it's cleared. Rules for operand
are the same as for the <tt>lldt</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="arpl"></a>
<tt>arpl</tt> compares the RPL (requestor's privilege level) fields of two
segment selectors. The first operand contains one segment selector and the
second operand contains the other. If the RPL field of the destination
operand is less than the RPL field of the source operand, the ZF flag is set
and the RPL field of the destination operand is increased to match that of
the source operand. Otherwise, the ZF flag is cleared and no change is made
to the destination operand. The destination operand can be a word general
register or memory, the source operand must be a general register.
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;arpl&nbsp;bx,ax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;adjust&nbsp;RPL&nbsp;of&nbsp;selector&nbsp;in&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;arpl&nbsp;[bx],ax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;adjust&nbsp;RPL&nbsp;of&nbsp;selector&nbsp;in&nbsp;memory

</pre>

<div class="p"><!----></div>
<a class="a" id="clts"></a>
<tt>clts</tt> clears the TS (task switched) flag in the <tt>cr0</tt> register.
This instruction has no operands.

<div class="p"><!----></div>
<a class="a" id="lock"></a>
<tt>lock</tt> prefix causes the processor's bus-lock signal to be asserted during
execution of the accompanying instruction. In a multiprocessor environment,
the bus-lock signal insures that the processor has exclusive use of any shared
memory while the signal is asserted. The <tt>lock</tt> prefix can be prepended
only to the following instructions and only to those forms of the
instructions where the destination operand is a memory operand: <tt>add</tt>,
<tt>adc</tt>, <tt>and</tt>, <tt>btc</tt>, <tt>btr</tt>, <tt>bts</tt>, <tt>cmpxchg</tt>,
<tt>cmpxchg8b</tt>, <tt>dec</tt>, <tt>inc</tt>, <tt>neg</tt>, <tt>not</tt>, <tt>or</tt>,
................................................................................
operand is a memory operand, an undefined opcode exception may be generated.
An undefined opcode exception will also be generated if the <tt>lock</tt>
prefix is used with any instruction not in the above list. The <tt>xchg</tt>
instruction always asserts the bus-lock signal regardless of the presence or
absence of the <tt>lock</tt> prefix.

<div class="p"><!----></div>
<a class="a" id="hlt"></a>
<tt>hlt</tt> stops instruction execution and places the processor in a halted
state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET
signal will resume execution. This instruction has no operands.

<div class="p"><!----></div>
<a class="a" id="invlpg"></a>
<tt>invlpg</tt> invalidates (flushes) the TLB (translation lookaside buffer)
entry specified with the operand, which should be a memory. The processor
determines the page that contains that address and flushes the TLB entry for
that page.

<div class="p"><!----></div>
<a class="a" id="rdmsr"></a>
<a class="a" id="wrmsr"></a>
<tt>rdmsr</tt> loads the contents of a 64-bit MSR (model specific register)
of the address specified in the <tt>ecx</tt> register into registers <tt>edx</tt>
and <tt>eax</tt>. <tt>wrmsr</tt> writes the contents of registers <tt>edx</tt> and
<tt>eax</tt> into the 64-bit MSR of the address specified in the <tt>ecx</tt>
register. <tt>rdtsc</tt> loads the current value of the processor's time stamp
counter from the 64-bit MSR into the <tt>edx</tt> and <tt>eax</tt> registers.
The processor increments the time stamp counter MSR every clock cycle and
resets it to 0 whenever the processor is reset.

<div class="p"><!----></div>
<a class="a" id="rdpmc"></a>
<tt>rdpmc</tt> loads the contents of the 40-bit performance monitoring counter
specified in the <tt>ecx</tt> register into registers <tt>edx</tt> and
<tt>eax</tt>. These instructions have no operands.

<div class="p"><!----></div>
<a class="a" id="wbinvd"></a>
<tt>wbinvd</tt> writes back all modified cache lines in the processor's
internal cache to main memory and invalidates (flushes) the internal caches.
The instruction then issues a special function bus cycle that directs
external caches to also write back modified data and another bus cycle to
indicate that the external caches should be invalidated. This instruction has
no operands.

<div class="p"><!----></div>
<a class="a" id="rsm"></a>
<tt>rsm</tt> return program control from the system management mode to the
program that was interrupted when the processor received an SMM interrupt.
This instruction has no operands.

<div class="p"><!----></div>
<a class="a" id="sysenter"></a>
<a class="a" id="sysexit"></a>
<tt>sysenter</tt> executes a fast call to a level 0 system procedure, <tt>sysexit</tt>
executes a fast return to level 3 user code. The addresses used by these instructions
are stored in MSRs. These instructions have no operands.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.13"></a><h3>
2.1.13&nbsp;&nbsp;FPU instructions</h3>

................................................................................
the stack and each of them holds the double extended precision floating-point
value. When some values are pushed onto the stack or are removed from the top,
the FPU registers are shifted, so <tt>st0</tt> is always the value on
the top of FPU stack, <tt>st1</tt> is the first value below the top, etc.
The <tt>st0</tt> name has also the synonym <tt>st</tt>.

<div class="p"><!----></div>
<a class="a" id="fld"></a>
<tt>fld</tt> pushes the floating-point value onto the FPU register stack.
The operand can be 32-bit, 64-bit or 80-bit memory location or the
FPU register, its value is then loaded onto the top of FPU register stack
(the <tt>st0</tt> register) and is automatically converted into the
double extended precision format.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;fld&nbsp;dword&nbsp;[bx]&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;single&nbsp;prevision&nbsp;value&nbsp;from&nbsp;memory
&nbsp;&nbsp;&nbsp;&nbsp;fld&nbsp;st2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;push&nbsp;value&nbsp;of&nbsp;st2&nbsp;onto&nbsp;register&nbsp;stack

</pre>

<div class="p"><!----></div>
<a class="a" id="fld1"></a>
<a class="a" id="lfdz"></a>
<a class="a" id="ldl2t"></a>

<a class="a" id="lfdl2e"></a>
<a class="a" id="fldpi"></a>
<a class="a" id="fldlg2"></a>
<a class="a" id="fldln2"></a>
<tt>fld1</tt>, <tt>fldz</tt>, <tt>fldl2t</tt>, <tt>fldl2e</tt>, <tt>fldpi</tt>,
<tt>fldlg2</tt> and <tt>fldln2</tt> load the commonly used contants onto the
FPU register stack. The loaded constants are +1.0, +0.0, log<sub>2</sub>10,
log<sub>2</sub>e, &#960;, log<sub>10</sub>2 and ln2 respectively. These instructions
have no operands.

<div class="p"><!----></div>
<a class="a" id="fild"></a>
<tt>fild</tt> converts the signed integer source operand into double extended
precision floating-point format and pushes the result onto the FPU register
stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;fild&nbsp;qword&nbsp;[bx]&nbsp;&nbsp;;&nbsp;load&nbsp;64-bit&nbsp;integer&nbsp;from&nbsp;memory

</pre>

<div class="p"><!----></div>
<a class="a" id="fst"></a>
<a class="a" id="fstp"></a>
<tt>fst</tt> copies the value of <tt>st0</tt> register to the destination operand,
which can be 32-bit or 64-bit memory location or another FPU register.
<tt>fstp</tt> performs the same operation as <tt>fst</tt> and then pops the register
stack, getting rid of <tt>st0</tt>. <tt>fstp</tt> accepts the same operands as
the <tt>fst</tt> instruction and can also store value in the 80-bit memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;fst&nbsp;st3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;copy&nbsp;value&nbsp;of&nbsp;st0&nbsp;into&nbsp;st3&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;fstp&nbsp;tword&nbsp;[bx]&nbsp;&nbsp;;&nbsp;store&nbsp;value&nbsp;in&nbsp;memory&nbsp;and&nbsp;pop&nbsp;stack

</pre>

<div class="p"><!----></div>
<a class="a" id="fist"></a>
<tt>fist</tt> converts the value in <tt>st0</tt> to a signed integer and stores
the result in the destination operand. The operand can be 16-bit or
32-bit memory location. <tt>fistp</tt> performs the same operation and then
pops the register stack, it accepts the same operands as the <tt>fist</tt>
instruction and can also store integer value in the 64-bit memory, so it
has the same rules for operands as <tt>fild</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="fbld"></a>
<tt>fbld</tt> converts the packed BCD integer into double extended precision
floating-point format and pushes this value onto the FPU stack. <tt>fbstp</tt>
converts the value in <tt>st0</tt> to an 18-digit packed BCD integer, stores the
result in the destination operand, and pops the register stack. The operand
should be an 80-bit memory location.

<div class="p"><!----></div>
<a class="a" id="fadd"></a>
<tt>fadd</tt> adds the destination and source operand and stores the sum in the
destination location. The destination operand is always an FPU register, if the
source is a memory location, the destination is <tt>st0</tt> register and only
source operand should be specified. If both operands are FPU registers, at
least one of them should be <tt>st0</tt> register. An operand in memory can be
a 32-bit or 64-bit value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;fadd&nbsp;qword&nbsp;[bx]&nbsp;&nbsp;;&nbsp;add&nbsp;double&nbsp;precision&nbsp;value&nbsp;to&nbsp;st0
&nbsp;&nbsp;&nbsp;&nbsp;fadd&nbsp;st2,st0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;st0&nbsp;to&nbsp;st2

</pre>

<div class="p"><!----></div>
<a class="a" id="faddp"></a>
<tt>faddp</tt> adds the destination and source operand, stores the sum in the
destination location and then pops the register stack. The destination operand
must be an FPU register and the source operand must be the <tt>st0</tt>. When
no operands are specified, <tt>st1</tt> is used as a destination operand.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;faddp&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;st0&nbsp;to&nbsp;st1&nbsp;and&nbsp;pop&nbsp;the&nbsp;stack
&nbsp;&nbsp;&nbsp;&nbsp;faddp&nbsp;st2,st0&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;st0&nbsp;to&nbsp;st2&nbsp;and&nbsp;pop&nbsp;the&nbsp;stack

</pre>

<div class="p"><!----></div>
<a class="a" id="fiadd"></a>
<tt>fiadd</tt> instruction converts an integer source operand into double
extended precision floating-point value and adds it to the destination
operand. The operand should be a 16-bit or 32-bit memory location.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;fiadd&nbsp;word&nbsp;[bx]&nbsp;&nbsp;;&nbsp;add&nbsp;word&nbsp;integer&nbsp;to&nbsp;st0

</pre>

<div class="p"><!----></div>
<a class="a" id="fsub"></a>
<a class="a" id="fsubr"></a>
<a class="a" id="fmul"></a>
<a class="a" id="fdiv"></a>
<a class="a" id="fdivr"></a>
<tt>fsub</tt>, <tt>fsubr</tt>, <tt>fmul</tt>, <tt>fdiv</tt>, <tt>fdivr</tt> instruction
are similar to <tt>fadd</tt>, have the same rules for operands and differ only in
the perfomed computation. <tt>fsub</tt> subtracts the source operand from the
destination operand, <tt>fsubr</tt> subtract the destination operand from the
source operand, <tt>fmul</tt> multiplies the destination and source operands,
<tt>fdiv</tt> divides the destination operand by the source operand and <tt>fdivr</tt>
divides the source operand by the destination operand. <tt>fsubp</tt>, <tt>fsubrp</tt>,
<tt>fmulp</tt>, <tt>fdivp</tt>, <tt>fdivrp</tt> perform the same operations and pop the
register stack, the rules for operand are the same as for the <tt>faddp</tt>
instruction. <tt>fisub</tt>, <tt>fisubr</tt>, <tt>fimul</tt>, <tt>fidiv</tt>, <tt>fidivr</tt>
perform these operations after converting the integer source operand into
floating-point value, they have the same rules for operands as <tt>fiadd</tt>
instruction.

<div class="p"><!----></div>
<a class="a" id="fsqrt"></a>
<a class="a" id="fsin"></a>
<a class="a" id="fcos"></a>
<a class="a" id="fchs"></a>
<a class="a" id="fabs"></a>

<a class="a" id="frndint"></a>
<a class="a" id="f2xm1"></a>
<tt>fsqrt</tt> computes the square root of the value in <tt>st0</tt> register,
<tt>fsin</tt> computes the sine of that value, <tt>fcos</tt> computes the cosine
of that value, <tt>fchs</tt> complements its sign bit, <tt>fabs</tt> clears its sign to
create the absolute value, <tt>frndint</tt> rounds it to the nearest integral value,
depending on the current rounding mode. <tt>f2xm1</tt> computes the exponential value
of 2 to the power of <tt>st0</tt> and subtracts the 1.0 from it, the value of
<tt>st0</tt> must lie in the range &#8722;1.0 to +1.0.
All these instructions store the result in <tt>st0</tt> and have no operands.

<div class="p"><!----></div>

<a class="a" id="fsincos"></a>
<a class="a" id="fptan"></a>
<a class="a" id="fpatan"></a>
<a class="a" id="fyl2x"></a>
<a class="a" id="fyl2xp1"></a>
<a class="a" id="fprem"></a>

<a class="a" id="fprem1"></a>
<a class="a" id="fscale"></a>
<a class="a" id="fxtract"></a>
<a class="a" id="fnop"></a>
<tt>fsincos</tt> computes both the sine and the cosine of the value in
<tt>st0</tt> register, stores the sine in <tt>st0</tt> and pushes the cosine on the
top of FPU register stack. <tt>fptan</tt> computes the tangent of the value in
<tt>st0</tt>, stores the result in <tt>st0</tt> and pushes a 1.0 onto the FPU register
stack. <tt>fpatan</tt> computes the arctangent of the value in <tt>st1</tt> divided by
the value in <tt>st0</tt>, stores the result in <tt>st1</tt> and pops the FPU register
stack. <tt>fyl2x</tt> computes the binary logarithm of <tt>st0</tt>, multiplies it by
<tt>st1</tt>, stores the result in <tt>st1</tt> and pops the FPU register stack;
................................................................................
computes the remainder in the way specified by IEEE Standard 754. <tt>fscale</tt>
truncates the value in <tt>st1</tt> and increases the exponent of <tt>st0</tt> by this value.
<tt>fxtract</tt> separates the value in <tt>st0</tt> into its exponent and significand,
stores the exponent in <tt>st0</tt> and pushes the significand onto the register
stack. <tt>fnop</tt> performs no operation. These instructions have no operands.

<div class="p"><!----></div>
<a class="a" id="fxch"></a>
<tt>fxch</tt> exchanges the contents of <tt>st0</tt> an another FPU register. The
operand should be an FPU register, if no operand is specified, the contents of
<tt>st0</tt> and <tt>st1</tt> are exchanged.

<div class="p"><!----></div>
<a class="a" id="fcom"></a>
<a class="a" id="fcomp"></a>
<tt>fcom</tt> and <tt>fcomp</tt> compare the contents of <tt>st0</tt> and the source
operand and set flags in the FPU status word according to the results.
<tt>fcomp</tt> additionally pops the register stack after performing the comparison.
The operand can be a single or double precision value in memory or the FPU register.
When no operand is specified, <tt>st1</tt> is used as a source operand.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;fcom&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;compare&nbsp;st0&nbsp;with&nbsp;st1
&nbsp;&nbsp;&nbsp;&nbsp;fcomp&nbsp;st2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;compare&nbsp;st0&nbsp;with&nbsp;st2&nbsp;and&nbsp;pop&nbsp;stack

</pre>

<div class="p"><!----></div>
<a class="a" id="fcompp"></a>
<tt>fcompp</tt> compares the contents of <tt>st0</tt> and <tt>st1</tt>, sets flags in the
FPU status word according to the results and pops the register stack twice.
This instruction has no operands.

<div class="p"><!----></div>
<a class="a" id="fucom"></a>
<a class="a" id="fucomp"></a>
<a class="a" id="fucompp"></a>
<tt>fucom</tt>, <tt>fucomp</tt> and <tt>fucompp</tt> performs an unordered comparison of
two FPU registers. Rules for operands are the same as for the <tt>fcom</tt>,
<tt>fcomp</tt> and <tt>fcompp</tt>, but the source operand must be an FPU register.

<div class="p"><!----></div>
<a class="a" id="ficom"></a>
<a class="a" id="ficomp"></a>
<tt>ficom</tt> and <tt>ficomp</tt> compare the value in <tt>st0</tt> with an integer
source operand and set the flags in the FPU status word according to the results.
<tt>ficomp</tt> additionally pops the register stack after performing the comparison.
The integer value is converted to double extended precision floating-point format
before the comparison is made. The operand should be a 16-bit or 32-bit memory
location.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;ficom&nbsp;word&nbsp;[bx]&nbsp;&nbsp;;&nbsp;compare&nbsp;st0&nbsp;with&nbsp;16-bit&nbsp;integer

</pre>

<div class="p"><!----></div>
<a class="a" id="fcomi"></a>
<a class="a" id="fcomip"></a>
<a class="a" id="fucomi"></a>
<a class="a" id="fucomip"></a>
<tt>fcomi</tt>, <tt>fcomip</tt>, <tt>fucomi</tt>, <tt>fucomip</tt> perform the comparison
of <tt>st0</tt> with another FPU register and set the ZF, PF and CF flags according to
the results. <tt>fcomip</tt> and <tt>fucomip</tt> additionaly pop the register stack
after performing the comparison.

<div class="p"><!----></div>

<a class="a" id="fcmovb"></a>
<a class="a" id="fcmove"></a>
<a class="a" id="fcmovbe"></a>
<a class="a" id="fcmovu"></a>
<a class="a" id="fcmovnb"></a>
<a class="a" id="fcmovne"></a>
<a class="a" id="fcmovnbe"></a>
<a class="a" id="fcmovnu"></a>
The instructions obtained by attaching the FPU
condition mnemonic (see table ) to the <tt>fcmov</tt> mnemonic
transfer the specified FPU register into <tt>st0</tt> register if the given test
condition is true. These instructions allow two different syntaxes, one with single
operand specifying the source FPU register, and one with two operands, in that case
destination operand should be <tt>st0</tt> register and the second operand specifies
the source FPU register.

................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb2.2">
</a> 
<div style="text-align:center">
<table border="1" class="tabular">
<tr><td align="center">Mnemonic </td><td align="center">Condition tested </td><td align="center">Description </td></tr><tr><td></td></tr>
<tr><td align="center"><tt>b</tt> </td><td align="center">CF = 1 </td><td align="center">below</td></tr>
<tr><td align="center"><tt>e</tt> </td><td align="center">ZF = 1 </td><td align="center">equal</td></tr>
<tr><td align="center"><tt>be</tt> </td><td align="center">CF <tt>or</tt> ZF = 1 </td><td align="center">below or equal</td></tr>
<tr><td align="center"><tt>u</tt> </td><td align="center">PF = 1 </td><td align="center">unordered</td></tr>
<tr><td align="center"><tt>nb</tt> </td><td align="center">CF = 0 </td><td align="center">not below</td></tr>
<tr><td align="center"><tt>ne</tt> </td><td align="center">ZF = 0 </td><td align="center">not equal</td></tr>
................................................................................
</div>

<div style="text-align:center">Table 2.2: FPU conditions.</div>
<a id="tab:FPU_conditions">
</a>

<div class="p"><!----></div>
<a class="a" id="ftst"></a>
<a class="a" id="fxam"></a>
<tt>ftst</tt> compares the value in <tt>st0</tt> with 0.0 and sets the flags in the
FPU status word according to the results. <tt>fxam</tt> examines the contents of the
<tt>st0</tt> and sets the flags in FPU status word to indicate the class of value in
the register. These instructions have no operands.

<div class="p"><!----></div>
<a class="a" id="fstsw"></a>
<a class="a" id="fnstsw"></a>
<tt>fstsw</tt> and <tt>fnstsw</tt> store the current value of the FPU status word in the
destination location. The destination operand can be either a 16-bit memory or the
<tt>ax</tt> register. <tt>fstsw</tt> checks for pending umasked FPU exceptions before
storing the status word, <tt>fnstsw</tt> does not.

<div class="p"><!----></div>
<a class="a" id="fstcw"></a>
<a class="a" id="fnstcw"></a>
<tt>fstcw</tt> and <tt>fnstcw</tt> store the current value of the FPU control word
at the specified destination in memory. <tt>fstcw</tt> checks for pending unmasked FPU
exceptions before storing the control word, <tt>fnstcw</tt> does not. <tt>fldcw</tt> loads
the operand into the FPU control word. The operand should be a 16-bit memory
location.

<div class="p"><!----></div>

<a class="a" id="fstenv"></a>
<a class="a" id="fnstenv"></a>
<a class="a" id="fldenv"></a>

<a class="a" id="fsave"></a>
<a class="a" id="fnsave"></a>
<a class="a" id="frstor"></a>
<a class="a" id="fstenvw"></a>
<a class="a" id="fnstenvw"></a>
<a class="a" id="fldenvw"></a>
<a class="a" id="fsavew"></a>
<a class="a" id="fnsavew"></a>
<a class="a" id="frstorw"></a>
<a class="a" id="fstenvd"></a>
<a class="a" id="fnstenvd"></a>
<a class="a" id="fldenvd"></a>
<a class="a" id="fsaved"></a>
<a class="a" id="fnsaved"></a>
<a class="a" id="frstord"></a>
<tt>fstenv</tt> and <tt>fnstenv</tt> store the current FPU operating environment at
the memory location specified with the destination operand, and then mask all
FPU exceptions. <tt>fstenv</tt> checks for pending umasked FPU exceptions before
proceeding, <tt>fnstenv</tt> does not. <tt>fldenv</tt> loads the complete operating
environment from memory into the FPU. <tt>fsave</tt> and <tt>fnsave</tt>
store the current FPU state (operating environment and register stack) at the
specified destination in memory and reinitializes the FPU. <tt>fsave</tt> check
for pending unmasked FPU exceptions before proceeding, <tt>fnsave</tt> does not.
................................................................................
exist two additional mnemonics that allow to precisely select the type of the
operation. The <tt>fstenvw</tt>, <tt>fnstenvw</tt>, <tt>fldenvw</tt>, <tt>fsavew</tt>, <tt>fnsavew</tt> and
<tt>frstorw</tt> mnemonics force the instruction to perform operation as in the 16-bit
mode, while <tt>fstenvd</tt>, <tt>fnstenvd</tt>, <tt>fldenvd</tt>, <tt>fsaved</tt>, <tt>fnsaved</tt> and <tt>frstord</tt>
force the operation as in 32-bit mode.

<div class="p"><!----></div>
<a class="a" id="finit"></a>
<a class="a" id="fninit"></a>
<a class="a" id="fclex"></a>

<a class="a" id="fnclex"></a>
<a class="a" id="wait"></a>
<a class="a" id="fwait"></a>
<tt>finit</tt> and <tt>fninit</tt> set the FPU operating environment into its default
state. <tt>finit</tt> checks for pending unmasked FPU exception before proceeding,
<tt>fninit</tt> does not. <tt>fclex</tt> and <tt>fnclex</tt> clear the FPU exception flags in the FPU
status word. <tt>fclex</tt> checks for pending unmasked FPU exception before proceeding,
<tt>fnclex</tt> does not. <tt>wait</tt> and <tt>fwait</tt> are synonyms for the same
instruction, which causes the processor to check for pending unmasked FPU exceptions
and handle them before proceeding. These instructions have no operands.

<div class="p"><!----></div>
<a class="a" id="ffree"></a>
<tt>ffree</tt> sets the tag associated with specified FPU register to empty. The
operand should be an FPU register.

<div class="p"><!----></div>

<a class="a" id="fincstp"></a>
<a class="a" id="fdecstp"></a>
<tt>fincstp</tt> and <tt>fdecstp</tt> rotate the FPU stack by one by adding or
subtracting one to the pointer of the top of stack. These instructions have no
operands.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.14"></a><h3>
2.1.14&nbsp;&nbsp;MMX instructions</h3>
<a id="sec:MMX_instructions">
</a>
................................................................................
which are the low 64-bit parts of the 80-bit FPU registers. Because of this MMX
instructions cannot be used at the same time as FPU instructions. They can operate
on packed bytes (eight 8-bit integers), packed words (four 16-bit integers) or
packed double words (two 32-bit integers), use of packed formats allows to perform
operations on multiple data at one time.

<div class="p"><!----></div>
<a class="a" id="movq"></a>
<tt>movq</tt> copies a quad word from the source operand to the destination operand.
At least one of the operands must be a MMX register, the second one can be also
a MMX register or 64-bit memory location.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;movq&nbsp;mm0,mm1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;move&nbsp;quad&nbsp;word&nbsp;from&nbsp;register&nbsp;to&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;movq&nbsp;mm2,[ebx]&nbsp;&nbsp;&nbsp;;&nbsp;move&nbsp;quad&nbsp;word&nbsp;from&nbsp;memory&nbsp;to&nbsp;register

</pre>

<div class="p"><!----></div>
<a class="a" id="movd"></a>
<tt>movd</tt> copies a double word from the source operand to the destination operand.
One of the operands must be a MMX register, the second one can be a general register
or 32-bit memory location. Only low double word of MMX register is used.

<div class="p"><!----></div>
All general MMX operations have two operands, the destination operand should be
a MMX register, the source operand can be a MMX register or 64-bit memory location.
Operation is performed on the corresponding data elements of the source and destination
operand and stored in the data elements of the destination operand.
<a class="a" id="paddb"></a>
<a class="a" id="paddw"></a>

<a class="a" id="paddd"></a>

<tt>paddb</tt>, <tt>paddw</tt> and <tt>paddd</tt> perform the addition of packed bytes,
packed words, or packed double words.  
<a class="a" id="psubb"></a>
<a class="a" id="psubw"></a>

<a class="a" id="psubd"></a>

<tt>psubb</tt>, <tt>psubw</tt> and <tt>psubd</tt> perform the subtraction of appropriate types. 

<a class="a" id="paddsb"></a>
<a class="a" id="paddsw"></a>
<a class="a" id="psubsb"></a>
<a class="a" id="psubsw"></a>

<tt>paddsb</tt>, <tt>paddsw</tt>, <tt>psubsb</tt> and <tt>psubsw</tt> perform the addition or 
subtraction of packed bytes or packed words with the signed saturation. 

<a class="a" id="paddusb"></a>
<a class="a" id="paddusw"></a>
<a class="a" id="psubusb"></a>
<a class="a" id="psubusw"></a>

<tt>paddusb</tt>, <tt>paddusw</tt>, <tt>psubusb</tt>, <tt>psubusw</tt> are analoguous, but with 
unsigned saturation.
<a class="a" id="pmulhw"></a>
<a class="a" id="pmullw"></a>
&nbsp;<tt>pmulhw</tt> and <tt>pmullw</tt> performs a signed multiplication of the packed words
and store the high or low words of the results in the destination operand.

<a class="a" id="pmaddwd"></a>

<tt>pmaddwd</tt> performs a multiply of the packed words and adds the four intermediate
double word products in pairs to produce result as a packed double words.
<a class="a" id="pand"></a>
<a class="a" id="por"></a>
<a class="a" id="pxor"></a>

<a class="a" id="pandn"></a>

<tt>pand</tt>, <tt>por</tt> and <tt>pxor</tt> perform the logical operations on the quad words,
<tt>pandn</tt> peforms also a logical negation of the destination operand before the
operation.

<a class="a" id="pcmpeqb"></a>
<a class="a" id="pcmpeqw"></a>
<a class="a" id="pcmpeqd"></a>

<tt>pcmpeqb</tt>, <tt>pcmpeqw</tt> and <tt>pcmpeqd</tt> compare for equality of packed
bytes, packed words or packed double words. If a pair of data elements is equal,
the corresponding data element in the destination operand is filled with bits of
value 1, otherwise it's set to 0. 

<a class="a" id="pcmpgtb"></a>
<a class="a" id="pcmpgtw"></a>
<a class="a" id="pcmpgtd"></a>

<tt>pcmpgtb</tt>, <tt>pcmpgtw</tt> and <tt>pcmpgtd</tt>
perform the similar operation, but they check whether the data elements in
the destination operand are greater than the correspoding data elements in the
source operand.

<a class="a" id="packsswb"></a>
<a class="a" id="packssdw"></a>
<a class="a" id="packuswb"></a>

<tt>packsswb</tt> converts packed signed words into packed signed bytes, <tt>packssdw</tt>
converts packed signed double words into packed signed words, using saturation to
handle overflow conditions. <tt>packuswb</tt> converts packed signed words into
packed unsigned bytes. Converted data elements from the source operand are stored
in the low part of the destination operand, while converted data elements from
the destination operand are stored in the high part.

<a class="a" id="punpckhbw"></a>
<a class="a" id="punpckhwd"></a>
<a class="a" id="punpckhdq"></a>

<tt>punpckhbw</tt>, <tt>punpckhwd</tt> and <tt>punpckhdq</tt> interleaves the
data elements from the high parts of the source and destination operands and stores
the result into the destination operand. 

<a class="a" id="punpcklbw"></a>
<a class="a" id="punpcklwd"></a>
<a class="a" id="punpckldq"></a>

<tt>punpcklbw</tt>, <tt>punpcklwd</tt> and
<tt>punpckldq</tt> perform the same operation, but the low parts of the source and destination
operand are used.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;paddsb&nbsp;mm0,[esi]&nbsp;;&nbsp;add&nbsp;packed&nbsp;bytes&nbsp;with&nbsp;signed&nbsp;saturation
&nbsp;&nbsp;&nbsp;&nbsp;pcmpeqw&nbsp;mm3,mm7&nbsp;&nbsp;;&nbsp;compare&nbsp;packed&nbsp;words&nbsp;for&nbsp;equality

</pre>

<div class="p"><!----></div>
<a class="a" id="psllw"></a>
<a class="a" id="pslld"></a>
<a class="a" id="psllq"></a>
<tt>psllw</tt>, <tt>pslld</tt> and <tt>psllq</tt> perform logical shift left of the packed
words, packed double words or a single quad word in the destination operand by the
amount specified in the source operand. 
<a class="a" id="psrlw"></a>
<a class="a" id="psrld"></a>

<a class="a" id="psrlq"></a>

<tt>psrlw</tt>, <tt>psrld</tt> and <tt>psrlq</tt> perform logical shift right of the packed words, 
packed double words or a single quad word. 
<a class="a" id="psraw"></a>

<a class="a" id="psrad"></a>

<tt>psraw</tt> and <tt>psrad</tt> perform arithmetic shift of the packed words or
double words. The destination operand should be a MMX register, while source operand
can be a MMX register, 64-bit memory location, or 8-bit immediate value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;psllw&nbsp;mm2,mm4&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;shift&nbsp;words&nbsp;left&nbsp;logically
&nbsp;&nbsp;&nbsp;&nbsp;psrad&nbsp;mm4,[ebx]&nbsp;&nbsp;;&nbsp;shift&nbsp;double&nbsp;words&nbsp;right&nbsp;arithmetically

</pre>

<div class="p"><!----></div>
<a class="a" id="emms"></a>
<tt>emms</tt> makes the FPU registers usable for the FPU instructions, it must be used
before using the FPU instructions if any MMX instructions were used.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.15"></a><h3>
2.1.15&nbsp;&nbsp;SSE instructions</h3>
The SSE extension adds more MMX instructions and also introduces the
operations on packed single precision floating point values. The 128-bit
packed single precision format consists of four single precision floating
point values. The 128-bit SSE registers are designed for the purpose of
operations on this data type.

<div class="p"><!----></div>
<a class="a" id="movaps"></a>
<a class="a" id="movups"></a>
<tt>movaps</tt> and <tt>movups</tt> transfer a double quad word operand containing packed
single precision values from source operand to destination operand. At least
one of the operands have to be a SSE register, the second one can be also a
SSE register or 128-bit memory location. Memory operands for <tt>movaps</tt>
instruction must be aligned on boundary of 16 bytes, operands for <tt>movups</tt>
instruction don't have to be aligned.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;movups&nbsp;xmm0,[ebx]&nbsp;&nbsp;;&nbsp;move&nbsp;unaligned&nbsp;double&nbsp;quad&nbsp;word

</pre>

<div class="p"><!----></div>
<a class="a" id="movlps"></a>
<a class="a" id="movhps"></a>
<tt>movlps</tt> moves packed two single precision values between the memory and the
low quad word of SSE register. <tt>movhps</tt> moved packed two single precision
values between the memory and the high quad word of SSE register. One of the
operands must be a SSE register, and the other operand must be a 64-bit memory
location.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;movlps&nbsp;xmm0,[ebx]&nbsp;&nbsp;;&nbsp;move&nbsp;memory&nbsp;to&nbsp;low&nbsp;quad&nbsp;word&nbsp;of&nbsp;xmm0
&nbsp;&nbsp;&nbsp;&nbsp;movhps&nbsp;[esi],xmm7&nbsp;&nbsp;;&nbsp;move&nbsp;high&nbsp;quad&nbsp;word&nbsp;of&nbsp;xmm7&nbsp;to&nbsp;memory

</pre>

<div class="p"><!----></div>
<a class="a" id="movlhps"></a>
<a class="a" id="movhlps"></a>
<tt>movlhps</tt> moves packed two single precision values from the low quad word
of source register to the high quad word of destination register. <tt>movhlps</tt>
moves two packed single precision values from the high quad word of source
register to the low quad word of destination register. Both operands have to
be a SSE registers.

<div class="p"><!----></div>
<a class="a" id="movmskps"></a>
<tt>movmskps</tt> transfers the most significant bit of each of the four single
precision values in the SSE register into low four bits of a general register.
The source operand must be a SSE register, the destination operand must be a
general register.

<div class="p"><!----></div>
<a class="a" id="movss"></a>
<tt>movss</tt> transfers a single precision value between source and destination
operand (only the low double word is trasferred). At least one of the operands
have to be a SSE register, the second one can be also a SSE register or 32-bit
memory location.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;movss&nbsp;[edi],xmm3&nbsp;&nbsp;&nbsp;;&nbsp;move&nbsp;low&nbsp;double&nbsp;word&nbsp;of&nbsp;xmm3&nbsp;to&nbsp;memory

................................................................................
destination register. When the mnemonic ends with <tt>ss</tt>, the source operand
can be a 32-bit memory location or a SSE register, the destination operand
must be a SSE register and the operation is performed on single precision
values, only low double words of SSE registers are used in this case, the
result is stored in the low double word of destination register. 

<div class="p"><!----></div>
<a class="a" id="addps"></a>
<a class="a" id="addss"></a>
<a class="a" id="subps"></a>
<a class="a" id="subss"></a>
<a class="a" id="mulps"></a>
<a class="a" id="mulss"></a>
<a class="a" id="divps"></a>
<a class="a" id="divss"></a>
<a class="a" id="rcpps"></a>
<a class="a" id="rcpss"></a>

<a class="a" id="sqrtps"></a>
<a class="a" id="sqrtss"></a>
<a class="a" id="rsqrtps"></a>
<a class="a" id="rsqrtss"></a>
<a class="a" id="maxps"></a>
<a class="a" id="maxss"></a>
<a class="a" id="minps"></a>
<a class="a" id="minss"></a>
<tt>addps</tt> and <tt>addss</tt> add the values, <tt>subps</tt> and <tt>subss</tt> subtract the 
source value from destination value, <tt>mulps</tt> and <tt>mulss</tt> multiply the values, 
<tt>divps</tt> and <tt>divss</tt> divide the destination value by the source value, 
<tt>rcpps</tt> and <tt>rcpss</tt> compute the approximate reciprocal of the source value, 
<tt>sqrtps</tt> and <tt>sqrtss</tt> compute the square root of the source value, 
<tt>rsqrtps</tt> and <tt>rsqrtss</tt> compute the approximate reciprocal of square root 
of the source value, <tt>maxps</tt> and <tt>maxss</tt> compare the source and destination 
values and return the greater one, <tt>minps</tt> and <tt>minss</tt> compare the source and 
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;mulss&nbsp;xmm0,[ebx]&nbsp;&nbsp;&nbsp;;&nbsp;multiply&nbsp;single&nbsp;precision&nbsp;values
&nbsp;&nbsp;&nbsp;&nbsp;addps&nbsp;xmm3,xmm7&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;add&nbsp;packed&nbsp;single&nbsp;precision&nbsp;values

</pre>

<div class="p"><!----></div>
<a class="a" id="andps"></a>
<a class="a" id="andnps"></a>
<a class="a" id="orps"></a>
<a class="a" id="xorps"></a>
<tt>andps</tt>, <tt>andnps</tt>, <tt>orps</tt> and <tt>xorps</tt> perform the logical operations on
packed single precision values. The source operand can be a 128-bit memory
location or a SSE register, the destination operand must be a SSE register.

<div class="p"><!----></div>
<a class="a" id="cmpps"></a>
<a class="a" id="cmpss"></a>

<a class="a" id="cmpeqps"></a>
<a class="a" id="cmpeqss"></a>
<a class="a" id="cmpltps"></a>
<a class="a" id="cmpltss"></a>
<a class="a" id="cmpleps"></a>
<a class="a" id="cmpless"></a>
<a class="a" id="cmpunordps"></a>
<a class="a" id="cmpunordss"></a>
<a class="a" id="cmpneqps"></a>
<a class="a" id="cmpneqss"></a>
<a class="a" id="cmpnltps"></a>
<a class="a" id="cmpnltss"></a>
<a class="a" id="cmpnleps"></a>
<a class="a" id="cmpnless"></a>
<a class="a" id="cmpordps"></a>
<a class="a" id="cmpordss"></a>
<tt>cmpps</tt> compares packed single precision values and returns a mask result
into the destination operand, which must be a SSE register. The source operand
can be a 128-bit memory location or SSE register, the third operand must be an
immediate operand selecting code of one of the eight compare conditions
(table ). <tt>cmpss</tt> performs the same operation on single precision values,
only low double word of destination register is affected, in this case source
operand can be a 32-bit memory location or SSE register. These two
instructions have also variants with only two operands and the condition
................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb2.3">
</a> 
<div style="text-align:center">
<table border="1" class="tabular">
<tr><td align="center">Code </td><td align="center">Mnemonic </td><td align="center">Description </td></tr><tr><td></td></tr>
<tr><td align="center">0 </td><td align="center"><tt>eq</tt> </td><td align="center">equal </td></tr>
<tr><td align="center">1 </td><td align="center"><tt>lt</tt> </td><td align="center">less than </td></tr>
<tr><td align="center">2 </td><td align="center"><tt>le</tt> </td><td align="center">less than or equal </td></tr>
<tr><td align="center">3 </td><td align="center"><tt>unord</tt> </td><td align="center">unordered </td></tr>
<tr><td align="center">4 </td><td align="center"><tt>neq</tt> </td><td align="center">not equal </td></tr>
<tr><td align="center">5 </td><td align="center"><tt>nlt</tt> </td><td align="center">not less than </td></tr>
................................................................................
</div>

<div style="text-align:center">Table 2.3: SSE conditions.</div>
<a id="tab:SSE_conditions">
</a>

<div class="p"><!----></div>
<a class="a" id="comiss"></a>
<a class="a" id="ucomiss"></a>
<tt>comiss</tt> and <tt>ucomiss</tt> compare the single precision values and set the ZF,
PF and CF flags to show the result. The destination operand must be a SSE
register, the source operand can be a 32-bit memory location or SSE register.

<div class="p"><!----></div>
<a class="a" id="shufps"></a>
<tt>shufps</tt> moves any two of the four single precision values from the
destination operand into the low quad word of the destination operand, and any
two of the four values from the source operand into the high quad word of the
destination operand. The destination operand must be a SSE register, the
source operand can be a 128-bit memory location or SSE register, the third
operand must be an 8-bit immediate value selecting which values will be moved
into the destination operand. Bits 0 and 1 select the value to be moved from
destination operand to the low double word of the result, bits 2 and 3 select
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;shufps&nbsp;xmm0,xmm0,10010011b&nbsp;;&nbsp;shuffle&nbsp;double&nbsp;words

</pre>

<div class="p"><!----></div>
<a class="a" id="unpckhps"></a>
<a class="a" id="unpcklps"></a>
<tt>unpckhps</tt> performs an interleaved unpack of the values from the high parts
of the source and destination operands and stores the result in the
destination operand, which must be a SSE register. The source operand can be
a 128-bit memory location or a SSE register. <tt>unpcklps</tt> performs an
interleaved unpack of the values from the low parts of the source and
destination operand and stores the result in the destination operand,
the rules for operands are the same.

<div class="p"><!----></div>
<a class="a" id="cvtpi2ps"></a>
<tt>cvtpi2ps</tt> converts packed two double word integers into the the packed two
single precision floating point values and stores the result in the low quad
word of the destination operand, which should be a SSE register. The source
operand can be a 64-bit memory location or MMX register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;cvtpi2ps&nbsp;xmm0,mm0&nbsp;&nbsp;;&nbsp;integers&nbsp;to&nbsp;single&nbsp;precision&nbsp;values

</pre>

<div class="p"><!----></div>
<a class="a" id="cvtsi2ss"></a>
<tt>cvtsi2ss</tt> converts a double word integer into a single precision floating
point value and stores the result in the low double word of the destination
operand, which should be a SSE register. The source operand can be a 32-bit
memory location or 32-bit general register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;cvtsi2ss&nbsp;xmm0,eax&nbsp;&nbsp;;&nbsp;integer&nbsp;to&nbsp;single&nbsp;precision&nbsp;value

</pre>

<div class="p"><!----></div>
<a class="a" id="cvtps2pi"></a>
<a class="a" id="cvttps2pi"></a>
<tt>cvtps2pi</tt> converts packed two single precision floating point values into
packed two double word integers and stores the result in the destination
operand, which should be a MMX register. The source operand can be a 64-bit
memory location or SSE register, only low quad word of SSE register is used.
<tt>cvttps2pi</tt> performs the similar operation, except that truncation is used to
round a source values to integers, rules for the operands are the same.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;cvtps2pi&nbsp;mm0,xmm0&nbsp;&nbsp;;&nbsp;single&nbsp;precision&nbsp;values&nbsp;to&nbsp;integers

</pre>

<div class="p"><!----></div>
<a class="a" id="cvtss2si"></a>
<a class="a" id="cvttss2si"></a>
<tt>cvtss2si</tt> convert a single precision floating point value into a double
word integer and stores the result in the destination operand, which should be
a 32-bit general register. The source operand can be a 32-bit memory location
or SSE register, only low double word of SSE register is used. <tt>cvttss2si</tt>
performs the similar operation, except that truncation is used to round a
source value to integer, rules for the operands are the same.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;cvtss2si&nbsp;eax,xmm0&nbsp;&nbsp;;&nbsp;single&nbsp;precision&nbsp;value&nbsp;to&nbsp;integer

</pre>

<div class="p"><!----></div>
<a class="a" id="pextrw"></a>
<tt>pextrw</tt> copies the word in the source operand specified by the third
operand to the destination operand. The source operand must be a MMX register,
the destination operand must be a 32-bit general register (the high word of
the destination is cleared), the third operand must an 8-bit immediate value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;pextrw&nbsp;eax,mm0,1&nbsp;&nbsp;&nbsp;;&nbsp;extract&nbsp;word&nbsp;into&nbsp;eax

</pre>

<div class="p"><!----></div>
<a class="a" id="pinsrw"></a>
<tt>pinsrw</tt> inserts a word from the source operand in the destination operand
at the location specified with the third operand, which must be an 8-bit
immediate value. The destination operand must be a MMX register, the source
operand can be a 16-bit memory location or 32-bit general register (only low
word of the register is used).

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;pinsrw&nbsp;mm1,ebx,2&nbsp;&nbsp;&nbsp;;&nbsp;insert&nbsp;word&nbsp;from&nbsp;ebx

</pre>

<div class="p"><!----></div>
<a class="a" id="pavgb"></a>
<a class="a" id="pavgw"></a>

<a class="a" id="pmaxub"></a>
<a class="a" id="pminub"></a>
<a class="a" id="pmaxsw"></a>
<a class="a" id="pminsw"></a>
<a class="a" id="pmulhuw"></a>
<a class="a" id="psadbw"></a>
<tt>pavgb</tt> and <tt>pavgw</tt> compute average of packed bytes or words. <tt>pmaxub</tt>
return the maximum values of packed unsigned bytes, <tt>pminub</tt> returns the
minimum values of packed unsigned bytes, <tt>pmaxsw</tt> returns the maximum values
of packed signed words, <tt>pminsw</tt> returns the minimum values of packed signed
words. <tt>pmulhuw</tt> performs a unsigned multiplication of the packed words and stores
the high words of the results in the destination operand. <tt>psadbw</tt> computes
the absolute differences of packed unsigned bytes, sums the differences, and
stores the sum in the low word of destination operand. All these instructions
follow the same rules for operands as the general MMX operations described in
previous section.

<div class="p"><!----></div>
<a class="a" id="pmovmskb"></a>
<tt>pmovmskb</tt> creates a mask made of the most significant bit of each byte in
the source operand and stores the result in the low byte of destination
operand. The source operand must be a MMX register, the destination operand
must a 32-bit general register.

<div class="p"><!----></div>
<a class="a" id="pshufw"></a>
<tt>pshufw</tt> inserts words from the source operand in the destination operand
from the locations specified with the third operand. The destination operand
must be a MMX register, the source operand can be a 64-bit memory location or
MMX register, third operand must an 8-bit immediate value selecting which
values will be moved into destination operand, in the similar way as the third
operand of the <tt>shufps</tt> instruction.

<div class="p"><!----></div>

<a class="a" id="movntq"></a>
<a class="a" id="movntps"></a>
<a class="a" id="maskmovq"></a>
<tt>movntq</tt> moves the quad word from the source operand to memory using a
non-temporal hint to minimize cache pollution. The source operand should be a
MMX register, the destination operand should be a 64-bit memory location.
<tt>movntps</tt> stores packed single precision values from the SSE register to
memory using a non-temporal hint. The source operand should be a SSE register,
the destination operand should be a 128-bit memory location. <tt>maskmovq</tt> stores
selected bytes from the first operand into a 64-bit memory location using a
non-temporal hint. Both operands should be a MMX registers, the second operand
selects wich bytes from the source operand are written to memory. The
memory location is pointed by DI (or EDI) register in the segment selected
by DS.

<div class="p"><!----></div>
<a class="a" id="prefetch0"></a>
<a class="a" id="prefetch1"></a>
<a class="a" id="prefetch2"></a>
<a class="a" id="prefetchnta"></a>
<tt>prefetcht0</tt>, <tt>prefetcht1</tt>, <tt>prefetcht2</tt> and <tt>prefetchnta</tt> fetch the line
of data from memory that contains byte specified with the operand to a
specified location in hierarchy.  The operand should be an 8-bit memory
location.

<div class="p"><!----></div>
<a class="a" id="sfence"></a>
<tt>sfence</tt> performs a serializing operation on all instruction storing to
memory that were issued prior to it. This instruction has no operands.

<div class="p"><!----></div>
<a class="a" id="ldmxcsr"></a>
<a class="a" id="stmxcsr"></a>
<tt>ldmxcsr</tt> loads the 32-bit memory operand into the MXCSR register. <tt>stmxcsr</tt>
stores the contents of MXCSR into a 32-bit memory operand.

<div class="p"><!----></div>

<a class="a" id="fxsave"></a>
<a class="a" id="fxrstor"></a>
<a class="a" id="fxsave"></a>
<tt>fxsave</tt> saves the current state of the FPU, MXCSR register, and all the FPU
and SSE registers to a 512-byte memory location specified in the destination
operand. <tt>fxrstor</tt> reloads data previously stored with <tt>fxsave</tt> instruction
from the specified 512-byte memory location. The memory operand for both those
instructions must be aligned on 16 byte boundary, it should declare operand
of no specified size.

<div class="p"><!----></div>
................................................................................
     <a id="tth_sEc2.1.16"></a><h3>
2.1.16&nbsp;&nbsp;SSE2 instructions</h3>
The SSE2 extension introduces the operations on packed double precision
floating point values, extends the syntax of MMX instructions, and adds also
some new instructions.

<div class="p"><!----></div>
<a class="a" id="movapd"></a>
<a class="a" id="movupd"></a>
<tt>movapd</tt> and <tt>movupd</tt> transfer a double quad word operand containing packed
double precision values from source operand to destination operand. These
instructions are analogous to <tt>movaps</tt> and <tt>movups</tt> and have the same rules
for operands.

<div class="p"><!----></div>
<a class="a" id="movlpd"></a>
<a class="a" id="movhpd"></a>
<tt>movlpd</tt> moves double precision value between the memory and the low quad
word of SSE register. <tt>movhpd</tt> moved double precision value between the memory
and the high quad word of SSE register. These instructions are analogous to
<tt>movlps</tt> and <tt>movhps</tt> and have the same rules for operands.

<div class="p"><!----></div>
<a class="a" id="movmskpd"></a>
<tt>movmskpd</tt> transfers the most significant bit of each of the two double
precision values in the SSE register into low two bits of a general register.
This instruction is analogous to <tt>movmskps</tt> and has the same rules for
operands.

<div class="p"><!----></div>
<a class="a" id="movsd"></a>
<tt>movsd</tt> transfers a double precision value between source and destination
operand (only the low quad word is trasferred). At least one of the operands
have to be a SSE register, the second one can be also a SSE register or 64-bit
memory location.

<div class="p"><!----></div>
<a class="a" id="addpd"></a>
<a class="a" id="addsd"></a>
<a class="a" id="subpd"></a>
<a class="a" id="subsd"></a>
<a class="a" id="mulpd"></a>
<a class="a" id="mulsd"></a>
<a class="a" id="divpd"></a>
<a class="a" id="divsd"></a>

<a class="a" id="sqrtpd"></a>
<a class="a" id="sqrtsd"></a>
<a class="a" id="maxpd"></a>
<a class="a" id="maxsd"></a>
<a class="a" id="minpd"></a>
<a class="a" id="minsd"></a>
Arithmetic operations on double precision values are: <tt>addpd</tt>, <tt>addsd</tt>,
<tt>subpd</tt>, <tt>subsd</tt>, <tt>mulpd</tt>, <tt>mulsd</tt>, <tt>divpd</tt>, <tt>divsd</tt>, <tt>sqrtpd</tt>, <tt>sqrtsd</tt>,
<tt>maxpd</tt>, <tt>maxsd</tt>, <tt>minpd</tt>, <tt>minsd</tt>, and they are analoguous to arithmetic
operations on single precision values described in previous section. When the
mnemonic ends with <tt>pd</tt> instead of <tt>ps</tt>, the operation is performed on packed
two double precision values, but rules for operands are the same. When the
mnemonic ends with <tt>sd</tt> instead of <tt>ss</tt>, the source operand can be a 64-bit
memory location or a SSE register, the destination operand must be a SSE
register and the operation is performed on double precision values, only low
quad words of SSE registers are used in this case.

<div class="p"><!----></div>
<a class="a" id="andpd"></a>
<a class="a" id="andnpd"></a>
<a class="a" id="orpd"></a>
<a class="a" id="xorpd"></a>
<tt>andpd</tt>, <tt>andnpd</tt>, <tt>orpd</tt> and <tt>xorpd</tt> perform the logical operations on
packed double precision values. They are analoguous to SSE logical operations
on single prevision values and have the same rules for operands.

<div class="p"><!----></div>
<a class="a" id="cmppd"></a>
<a class="a" id="cmpsd"></a>

<a class="a" id="cmpeqpd"></a>
<a class="a" id="cmpeqsd"></a>
<a class="a" id="cmpltpd"></a>
<a class="a" id="cmpltsd"></a>
<a class="a" id="cmplepd"></a>
<a class="a" id="cmplesd"></a>
<a class="a" id="cmpunordpd"></a>
<a class="a" id="cmpunordsd"></a>
<a class="a" id="cmpneqpd"></a>
<a class="a" id="cmpneqsd"></a>
<a class="a" id="cmpnltpd"></a>
<a class="a" id="cmpnltsd"></a>
<a class="a" id="cmpnlepd"></a>
<a class="a" id="cmpnlesd"></a>
<a class="a" id="cmpordpd"></a>
<a class="a" id="cmpordsd"></a>
<tt>cmppd</tt> compares packed double precision values and returns and returns a
mask result into the destination operand. This instruction is analoguous to
<tt>cmpps</tt> and has the same rules for operands. <tt>cmpsd</tt> performs the same
operation on double precision values, only low quad word of destination
register is affected, in this case source operand can be a 64-bit memory or
SSE register. Variant with only two operands are obtained by attaching the
condition mnemonic from table <a href="#tab:SSE_conditions">2.3</a> to the <tt>cmp</tt> mnemonic and then attaching
the <tt>pd</tt> or <tt>sd</tt> at the end.

<div class="p"><!----></div>
<a class="a" id="comisd"></a>
<a class="a" id="ucomisd"></a>
<tt>comisd</tt> and <tt>ucomisd</tt> compare the double precision values and set the ZF,
PF and CF flags to show the result. The destination operand must be a SSE
register, the source operand can be a 128-bit memory location or SSE register.

<div class="p"><!----></div>
<a class="a" id="shufpd"></a>
<a class="a" id="shufps"></a>
<tt>shufpd</tt> moves any of the two double precision values from the destination
operand into the low quad word of the destination operand, and any of the two
values from the source operand into the high quad word of the destination
operand. This instruction is analoguous to <tt>shufps</tt> and has the same rules for
operand. Bit 0 of the third operand selects the value to be moved from the
destination operand, bit 1 selects the value to be moved from the source
operand, the rest of bits are reserved and must be zeroed.

<div class="p"><!----></div>
<a class="a" id="unpckhpd"></a>
<a class="a" id="unpcklpd"></a>
<tt>unpckhpd</tt> performs an unpack of the high quad words from the source and
destination operands, <tt>unpcklpd</tt> performs an unpack of the low quad words from
the source and destination operands. They are analoguous to <tt>unpckhps</tt> and
<tt>unpcklps</tt>, and have the same rules for operands.

<div class="p"><!----></div>

<a class="a" id="cvtps2pd"></a>
<a class="a" id="cvtpd2ps"></a>
<a class="a" id="cvtss2sd"></a>
<a class="a" id="cvtsd2ss"></a>
<tt>cvtps2pd</tt> converts the packed two single precision floating point values to
two packed double precision floating point values, the destination operand
must be a SSE register, the source operand can be a 64-bit memory location or
SSE register. <tt>cvtpd2ps</tt> converts the packed two double precision floating
point values to packed two single precision floating point values, the
destination operand must be a SSE register, the source operand can be a
128-bit memory location or SSE register. <tt>cvtss2sd</tt> converts the single
precision floating point value to double precision floating point value, the
................................................................................
destination operand must be a SSE register, the source operand can be a 32-bit
memory location or SSE register. <tt>cvtsd2ss</tt> converts the double precision
floating point value to single precision floating point value, the destination
operand must be a SSE register, the source operand can be 64-bit memory
location or SSE register.

<div class="p"><!----></div>

<a class="a" id="cvtpi2pd"></a>
<a class="a" id="cvtsi2sd"></a>
<a class="a" id="cvtpd2pi"></a>
<a class="a" id="cvttpd2pi"></a>
<a class="a" id="cvtsd2si"></a>
<a class="a" id="cvttsd2si"></a>
<tt>cvtpi2pd</tt> converts packed two double word integers into the the packed
double precision floating point values, the destination operand must be a SSE
register, the source operand can be a 64-bit memory location or MMX register.
<tt>cvtsi2sd</tt> converts a double word integer into a double precision floating
point value, the destination operand must be a SSE register, the source
operand can be a 32-bit memory location or 32-bit general register. <tt>cvtpd2pi</tt>
converts packed double precision floating point values into packed two double
word integers, the destination operand should be a MMX register, the source
................................................................................
precision floating point value into a double word integer, the destination
operand should be a 32-bit general register, the source operand can be a
64-bit memory location or SSE register. <tt>cvttsd2si</tt> performs the similar
operation, except that truncation is used to round a source value to integer,
rules for operands are the same.

<div class="p"><!----></div>

<a class="a" id="cvtps2dq"></a>
<a class="a" id="cvttps2dq"></a>
<a class="a" id="cvtpd2dq"></a>
<a class="a" id="cvttpd2dq"></a>
<a class="a" id="cvtdq2ps"></a>
<tt>cvtps2dq</tt> and <tt>cvttps2dq</tt> convert packed single precision floating point
values to packed four double word integers, storing them in the destination
operand. <tt>cvtpd2dq</tt> and <tt>cvttpd2dq</tt> convert packed double precision floating
point values to packed two double word integers, storing the result in the low
quad word of the destination operand. <tt>cvtdq2ps</tt> converts packed four double 
word integers to packed single precision floating point values. 

<div class="p"><!----></div>
For all these instructions destination operand must be a SSE register, the
source operand can be a 128-bit memory location or SSE register.

<div class="p"><!----></div>
<a class="a" id="cvtdq2pd"></a>
<tt>cvtdq2pd</tt> converts packed two double word integers from the low quad word
of the source operand to packed double precision floating point values, the source can be a 64-bit
memory location or SSE register, destination has to be SSE register.

<div class="p"><!----></div>
<a class="a" id="movdqa"></a>
<a class="a" id="movdqu"></a>
<tt>movdqa</tt> and <tt>movdqu</tt> transfer a double quad word operand containing packed
integers from source operand to destination operand. At least one of the
operands have to be a SSE register, the second one can be also a SSE register
or 128-bit memory location. Memory operands for <tt>movdqa</tt> instruction must be
aligned on boundary of 16 bytes, operands for <tt>movdqu</tt> instruction don't have
to be aligned.

<div class="p"><!----></div>
<a class="a" id="movq2dq"></a>
<a class="a" id="movdq2q"></a>
<tt>movq2dq</tt> moves the contents of the MMX source register to the low quad word
of destination SSE register. <tt>movdq2q</tt> moves the low quad word from the source
SSE register to the destination MMX register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;movq2dq&nbsp;xmm0,mm1&nbsp;&nbsp;&nbsp;;&nbsp;move&nbsp;from&nbsp;MMX&nbsp;register&nbsp;to&nbsp;SSE&nbsp;register
&nbsp;&nbsp;&nbsp;&nbsp;movdq2q&nbsp;mm0,xmm1&nbsp;&nbsp;&nbsp;;&nbsp;move&nbsp;from&nbsp;SSE&nbsp;register&nbsp;to&nbsp;MMX&nbsp;register

</pre>

<div class="p"><!----></div>

<a class="a" id="pshufhw"></a>
<a class="a" id="pshuflw"></a>
<a class="a" id="pshufd"></a>
All MMX instructions operating on the 64-bit packed integers (those with
mnemonics starting with <tt>p</tt>) are extended to operate on 128-bit packed
integers located in SSE registers. Additional syntax for these instructions
needs an SSE register where MMX register was needed, and the 128-bit memory
location or SSE register where 64-bit memory location or MMX register were
needed. The exception is <tt>pshufw</tt> instruction, which doesn't allow extended
syntax, but has two new variants: <tt>pshufhw</tt> and <tt>pshuflw</tt>, which allow only
the extended syntax, and perform the same operation as <tt>pshufw</tt> on the high
or low quad words of operands respectively. Also the new instruction <tt>pshufd</tt>
is introduced, which performs the same operation as <tt>pshufw</tt>, but on the
double words instead of words, it allows only the extended syntax.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;psubb&nbsp;xmm0,[esi]&nbsp;&nbsp;&nbsp;;&nbsp;subtract&nbsp;16&nbsp;packed&nbsp;bytes
&nbsp;&nbsp;&nbsp;&nbsp;pextrw&nbsp;eax,xmm0,7&nbsp;&nbsp;;&nbsp;extract&nbsp;highest&nbsp;word&nbsp;into&nbsp;eax

</pre>

<div class="p"><!----></div>
<a class="a" id="paddq"></a>
<a class="a" id="psubq"></a>
<a class="a" id="pmuludq"></a>
<tt>paddq</tt> performs the addition of packed quad words, <tt>psubq</tt> performs the
subtraction of packed quad words, <tt>pmuludq</tt> performs an unsigned multiplication
of low double words from each corresponding quad words and returns the results
in packed quad words. These instructions follow the same rules for operands as
the general MMX operations described in <a href="#sec:MMX_instructions">2.1.14</a>.

<div class="p"><!----></div>
<a class="a" id="pslldq"></a>
<a class="a" id="psrldq"></a>
<tt>pslldq</tt> and <tt>psrldq</tt> perform logical shift left or right of the double
quad word in the destination operand by the amount of bytes specified in the source
operand. The destination operand should be a SSE register, source operand
should be an 8-bit immediate value.

<div class="p"><!----></div>
<a class="a" id="punpckhqdq"></a>
<a class="a" id="punpcklqdq"></a>
<tt>punpckhqdq</tt> interleaves the high quad word of the source operand and the
high quad word of the destination operand and writes them to the destination
SSE register. <tt>punpcklqdq</tt> interleaves the low quad word of the source operand
and the low quad word of the destination operand and writes them to the
destination SSE register. The source operand can be a 128-bit memory location
or SSE register.

<div class="p"><!----></div>

<a class="a" id="movntdq"></a>
<a class="a" id="movntpd"></a>
<a class="a" id="movnti"></a>
<a class="a" id="maskmovdqu"></a>
<tt>movntdq</tt> stores packed integer data from the SSE register to memory using
non-temporal hint. The source operand should be a SSE register, the
destination operand should be a 128-bit memory location. <tt>movntpd</tt> stores
packed double precision values from the SSE register to memory using a
non-temporal hint. Rules for operand are the same. <tt>movnti</tt> stores integer
from a general register to memory using a non-temporal hint. The source
operand should be a 32-bit general register, the destination operand should
be a 32-bit memory location. <tt>maskmovdqu</tt> stores selected bytes from the first
................................................................................
operand into a 128-bit memory location using a non-temporal hint. Both
operands should be a SSE registers, the second operand selects wich bytes from
the source operand are written to memory. The memory location is pointed by DI
(or EDI) register in the segment selected by DS and does not need to be
aligned.

<div class="p"><!----></div>
<a class="a" id="clflush"></a>
<tt>clflush</tt> writes and invalidates the cache line associated with the address
of byte specified with the operand, which should be a 8-bit memory location.

<div class="p"><!----></div>

<a class="a" id="lfence"></a>
<a class="a" id="mfence"></a>
<a class="a" id="sfence"></a>
<a class="a" id="lfence"></a>
<tt>lfence</tt> performs a serializing operation on all instruction loading from
memory that were issued prior to it. <tt>mfence</tt> performs a serializing operation
on all instruction accesing memory that were issued prior to it, and so it
combines the functions of <tt>sfence</tt> (described in previous section) and
<tt>lfence</tt> instructions. These instructions have no operands.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.17"></a><h3>
2.1.17&nbsp;&nbsp;SSE3 instructions</h3>
Prescott technology introduced some new instructions to improve
the performance of SSE and SSE2 - this extension is called SSE3.

<div class="p"><!----></div>
<a class="a" id="fisttp"></a>
<tt>fisttp</tt> behaves like the <tt>fistp</tt> instruction and accepts the same operands,
the only difference is that it always used truncation, irrespective of the
rounding mode.

<div class="p"><!----></div>
<a class="a" id="movshdup"></a>
<tt>movshdup</tt> loads into destination operand the 128-bit value obtained from
the source value of the same size by filling the each quad word with the two
duplicates of the value in its high double word.

<div class="p"><!----></div>
<a class="a" id="movsldup"></a>
<tt>movsldup</tt> performs the same action, except it duplicates the values of low double words.
The destination operand should be SSE register, the source operand can be SSE register or
128-bit memory location.

<div class="p"><!----></div>
<a class="a" id="movddup"></a>
<tt>movddup</tt> loads the 64-bit source value and duplicates it into high and low
quad word of the destination operand. The destination operand should be SSE
register, the source operand can be SSE register or 64-bit memory location.

<div class="p"><!----></div>
<a class="a" id="lddqu"></a>
<tt>lddqu</tt> is functionally equivalent to <tt>movdqu</tt> with memory as
source operand, but it may improve performance when the source operand crosses
a cacheline boundary. The destination operand has to be SSE register, the source
operand must be 128-bit memory location.

<div class="p"><!----></div>
<a class="a" id="adddubps"></a>
<tt>addsubps</tt> performs single precision addition of second and fourth pairs and
single precision substracion of the first and third pairs of floating point
values in the operands.

<div class="p"><!----></div>
<a class="a" id="addsubpd"></a>
<tt>addsubpd</tt> performs double precision addition of the
second pair and double precision subtraction of the first pair of floating
point values in the operand.

<div class="p"><!----></div>
<a class="a" id="haddps"></a>
<tt>haddps</tt> performs the addition of two single
precision values within the each quad word of source and destination operands,
and stores the results of such horizontal addition of values from destination
operand into low quad word of destination operand, and the results from the
source operand into high quad word of destination operand.

<div class="p"><!----></div>
<a class="a" id="haddpd"></a>
<tt>haddpd</tt> performs
the addition of two double precision values within each operand, and stores
the result from destination operand into low quad word of destination operand,
and the result from source operand into high quad word of destination operand.
All these instructions need the destination operand to be SSE register, source
operand can be SSE register or 128-bit memory location.

<div class="p"><!----></div>
<a class="a" id="monitor"></a>
<tt>monitor</tt> sets up an address range for monitoring of write-back stores. It
need its three operands to be EAX, ECX and EDX register in that order.

<div class="p"><!----></div>
<a class="a" id="mwait"></a>
<tt>mwait</tt> waits for a write-back store to the address range set up by the
<tt>monitor</tt> instruction.
It uses two operands with additional parameters, first being the EAX and second
the ECX register.

<div class="p"><!----></div>
The functionality of SSE3 is further extended by the set of Supplemental
SSE3 instructions (SSSE3). They generally follow the same rules for operands
as all the MMX operations extended by SSE.

<div class="p"><!----></div>

<a class="a" id="phaddw"></a>
<a class="a" id="phaddd"></a>
<a class="a" id="phaddsw"></a>
<a class="a" id="phsubw"></a>
<a class="a" id="phsubd"></a>
<a class="a" id="phsubsw"></a>
<tt>phaddw</tt> and <tt>phaddd</tt> perform the horizontal additional of the pairs of
adjacent values from both the source and destination operand, and stores the
sums into the destination (sums from the source operand go into lower part of
destination register). They operate on 16-bit or 32-bit chunks, respectively.
<tt>phaddsw</tt> performs the same operation on signed 16-bit packed values, but the
result of each addition is saturated. <tt>phsubw</tt> and <tt>phsubd</tt> analogously
perform the horizontal subtraction of 16-bit or 32-bit packed value, and
<tt>phsubsw</tt> performs the horizontal subtraction of signed 16-bit packed values
with saturation.

<div class="p"><!----></div>
<a class="a" id="pabsb"></a>
<a class="a" id="pabsw"></a>
<a class="a" id="pabsd"></a>
<tt>pabsb</tt>, <tt>pabsw</tt> and <tt>pabsd</tt> calculate the absolute value of each signed
packed signed value in source operand and stores them into the destination
register. They operator on 8-bit, 16-bit and 32-bit elements respectively.

<div class="p"><!----></div>
<a class="a" id="pmaddubsw"></a>
<tt>pmaddubsw</tt> multiplies signed 8-bit values from the source operand with the
corresponding unsigned 8-bit values from the destination operand to produce
intermediate 16-bit values, and every adjacent pair of those intermediate
values is then added horizontally and those 16-bit sums are stored into the
destination operand.

<div class="p"><!----></div>
<a class="a" id="pmulhrsw"></a>
<tt>pmulhrsw</tt> multiplies corresponding 16-bit integers from the source and
destination operand to produce intermediate 32-bit values, and the 16 bits
next to the highest bit of each of those values are then rounded and packed
into the destination operand.

<div class="p"><!----></div>
<a class="a" id="pshufb"></a>
<tt>pshufb</tt> shuffles the bytes in the destination operand according to the
mask provided by source operand - each of the bytes in source operand is
an index of the target position for the corresponding byte in the destination.

<div class="p"><!----></div>
<a class="a" id="psignb"></a>
<a class="a" id="psignw"></a>
<a class="a" id="psignd"></a>
<tt>psignb</tt>, <tt>psignw</tt> and <tt>psignd</tt> perform the operation on 8-bit, 16-bit or
32-bit integers in destination operand, depending on the signs of the values
in the source. If the value in source is negative, the corresponding value in
the destination register is negated, if the value in source is positive, no
operation is performed on the corresponding value is performed, and if the
value in source is zero, the value in destination is zeroed, too.

<div class="p"><!----></div>
<a class="a" id="palifnr"></a>
<tt>palignr</tt> appends the source operand to the destination operand to form the
intermediate value of twice the size, and then extracts into the destination
register the 64 or 128 bits that are right-aligned to the byte offset
specified by the third operand, which should be an 8-bit immediate value. This
is the only SSSE3 instruction that takes three arguments.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.18"></a><h3>
2.1.18&nbsp;&nbsp;AMD 3DNow! instructions</h3>
The 3DNow! extension adds a new MMX instructions to those described in <a href="#sec:MMX_instructions">2.1.14</a>,
and introduces operation on the 64-bit packed floating point values, each
consisting of two single precision floating point values.

<div class="p"><!----></div>

<a class="a" id="pavgusb"></a>
<a class="a" id="pmulhrw"></a>
<a class="a" id="pi2fd"></a>
<a class="a" id="pf2id"></a>
<a class="a" id="pi2fw"></a>
<a class="a" id="pf2iw"></a>
<a class="a" id="pfadd"></a>
<a class="a" id="pfsub"></a>
<a class="a" id="pfsubr"></a>
<a class="a" id="pfmul"></a>
<a class="a" id="pfacc"></a>

<a class="a" id="pfnacc"></a>
<a class="a" id="pfpnacc"></a>
<a class="a" id="pfmax"></a>
<a class="a" id="pfmin"></a>
<a class="a" id="pswapd"></a>
<a class="a" id="pfrcp"></a>

<a class="a" id="pfrsqrt"></a>
<a class="a" id="pfrcpit1"></a>
<a class="a" id="pfrsqit1"></a>
<a class="a" id="pfrcpit2"></a>
<a class="a" id="pfcmpeq"></a>
<a class="a" id="pfcmpge"></a>
<a class="a" id="pfcmpgt"></a>
These instructions follow the same rules as the general MMX operations, the
destination operand should be a MMX register, the source operand can be a MMX
register or 64-bit memory location. 
<tt>pavgusb</tt> computes the rounded averages
of packed unsigned bytes. <tt>pmulhrw</tt> performs a signed multiplication of the packed
words, round the high word of each double word results and stores them in the
destination operand. <tt>pi2fd</tt> converts packed double word integers into
packed floating point values. <tt>pf2id</tt> converts packed floating point values
into packed double word integers using truncation. <tt>pi2fw</tt> converts packed
word integers into packed floating point values, only low words of each
double word in source operand are used. <tt>pf2iw</tt> converts packed floating
point values to packed word integers, results are extended to double words
using the sign extension. <tt>pfadd</tt> adds packed floating point values. <tt>pfsub</tt>
and <tt>pfsubr</tt> subtracts packed floating point values, the first one subtracts
source values from destination values, the second one subtracts destination
values from the source values. <tt>pfmul</tt> multiplies packed floating point
values. <tt>pfacc</tt> adds the low and high floating point values of the destination
operand, storing the result in the low double word of destination, and adds
the low and high floating point values of the source operand, storing the
result in the high double word of destination. <tt>pfnacc</tt> subtracts the high
floating point value of the destination operand from the low, storing the
result in the low double word of destination, and subtracts the high floating
point value of the source operand from the low, storing the result in the high
double word of destination. <tt>pfpnacc</tt> subtracts the high floating point value
of the destination operand from the low, storing the result in the low double
word of destination, and adds the low and high floating point values of the
source operand, storing the result in the high double word of destination.
<tt>pfmax</tt> and <tt>pfmin</tt> compute the maximum and minimum of floating point values.
<tt>pswapd</tt> reverses the high and low double word of the source operand. <tt>pfrcp</tt>
returns an estimates of the reciprocals of floating point values from the
source operand, <tt>pfrsqrt</tt> returns an estimates of the reciprocal square
................................................................................
all bits or zeroes all bits of the correspoding data element in the
destination operand according to the result of comparison, first checks
whether values are equal, second checks whether destination value is greater
or equal to source value, third checks whether destination value is greater
than source value.

<div class="p"><!----></div>
<a class="a" id="prefetch"></a>
<a class="a" id="prefetchw"></a>
<tt>prefetch</tt> and <tt>prefetchw</tt> load the line of data from memory that contains
byte specified with the operand into the data cache, <tt>prefetchw</tt> instruction
should be used when the data in the cache line is expected to be modified,
otherwise the <tt>prefetch</tt> instruction should be used. The operand should be an
8-bit memory location.

<div class="p"><!----></div>
<a class="a" id="femms"></a>
<tt>femms</tt> performs a fast clear of MMX state. It has no operands.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.19"></a><h3>
2.1.19&nbsp;&nbsp;The x86-64 long mode instructions</h3>

<div class="p"><!----></div>
The AMD64 and EM64T architectures (we will use the common name x86-64 for them
................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb2.4">
</a> 
<div style="text-align:center">
<table border="1" class="tabular">
<tr><td align="center">Type </td><td colspan="4" align="center">General </td><td align="center">SSE </td><td align="center">AVX </td></tr>
<tr><td align="center">Bits </td><td align="center">8 </td><td align="center">16 </td><td align="center">32 </td><td align="center">64 </td><td align="center">128 </td><td align="center">256 </td></tr><tr><td></td></tr>
<tr><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"><tt>rax</tt> </td><td align="center"></td><td align="center"></td></tr>
<tr><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"><tt>rcx</tt> </td><td align="center"></td><td align="center"></td></tr>
<tr><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"><tt>rdx</tt> </td><td align="center"></td><td align="center"></td></tr>
<tr><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"></td><td align="center"><tt>rbx</tt> </td><td align="center"></td><td align="center"></td></tr>
<tr><td align="center"></td><td align="center"><tt>spl</tt> </td><td align="center"></td><td align="center"></td><td align="center"><tt>rsp</tt> </td><td align="center"></td><td align="center"></td></tr>
................................................................................
<div class="p"><!----></div>
If any operation is performed on the 32-bit general registers in long mode,
the upper 32 bits of the 64-bit registers containing them are filled with
zeros. This is unlike the operations on 16-bit or 8-bit portions of those
registers, which preserve the upper bits.

<div class="p"><!----></div>
<a class="a" id="cdqe"></a>
<a class="a" id="cqo"></a>
<a class="a" id="movsxd"></a>
Three new type conversion instructions are available. The <tt>cdqe</tt> sign extends
the double word in EAX into quad word and stores the result in RAX register.
<tt>cqo</tt> sign extends the quad word in RAX into double quad word and stores the
extra bits in the RDX register. These instructions have no operands.
<tt>movsxd</tt> sign extends the double word source operand, being either the 32-bit register
or memory, into 64-bit destination operand, which has to be register.
No analogous instruction is needed for the zero extension, since it is done
automatically by any operations on 32-bit registers, as noted in previous
................................................................................
indirect far jumps and calls allow any operands that were allowed by the x86
architecture and also 80-bit memory operand is allowed (though only EM64T seems
to implement such variant), with the first eight bytes defining the offset and
two last bytes specifying the selector. The direct far jumps and calls are not
allowed in long mode.

<div class="p"><!----></div>
<a class="a" id="movsq"></a>
<a class="a" id="cmpsq"></a>
<a class="a" id="scasq"></a>
<a class="a" id="lodsq"></a>
<a class="a" id="stosq"></a>
The I/O instructions, <tt>in</tt>, <tt>out</tt>, <tt>ins</tt> and <tt>outs</tt> are the exceptional
instructions that are not extended to accept quad word operands in long mode.
But all other string operations are, and there are new short forms <tt>movsq</tt>,
<tt>cmpsq</tt>, <tt>scasq</tt>, <tt>lodsq</tt> and <tt>stosq</tt> introduced for the variants of string
operations for 64-bit string elements. The RSI and RDI registers are used by
default to address the string elements.

<div class="p"><!----></div>
................................................................................
implement such variant). The <tt>lds</tt> and <tt>les</tt> are disallowed in long mode.

<div class="p"><!----></div>
The system instructions like <tt>lgdt</tt> which required the 48-bit memory operand,
in long mode require the 80-bit memory operand.

<div class="p"><!----></div>
<a class="a" id="cmpxchg16b"></a>
The <tt>cmpxchg16b</tt> is the 64-bit equivalent of <tt>cmpxchg8b</tt> instruction, it uses
the double quad word memory operand and 64-bit registers to perform the analoguous operation.

<div class="p"><!----></div>
<a class="a" id="fxsave64"></a>
<a class="a" id="fxrstor64"></a>
The <tt>fxsave64</tt> and <tt>fxrstor64</tt> are new variants of <tt>fxsave</tt> and <tt>fxrstor</tt>
instructions, available only in long mode, which use a different format of
storage area in order to store some pointers in full 64-bit size.

<div class="p"><!----></div>
<a class="a" id="swapgs"></a>
<tt>swapgs</tt> is the new instruction, which swaps the contents of GS register and
the KernelGSbase model-specific register (MSR address 0C0000102h).

<div class="p"><!----></div>

<a class="a" id="syscall"></a>
<a class="a" id="sysret"></a>
<a class="a" id="sysexitq"></a>
<a class="a" id="sysretq"></a>
<tt>syscall</tt> and <tt>sysret</tt> is the pair of new instructions that provide the
functionality similar to <tt>sysenter</tt> and <tt>sysexit</tt> in long mode, where the
latter pair is disallowed. The <tt>sysexitq</tt> and <tt>sysretq</tt> mnemonics provide the
64-bit versions of <tt>sysexit</tt> and <tt>sysret</tt> instructions.

<div class="p"><!----></div>
<a class="a" id="rdmsrq"></a>
<a class="a" id="wrmsrq"></a>
The <tt>rdmsrq</tt> and <tt>wrmsrq</tt> mnemonics are the 64-bit variants of the <tt>rdmsr</tt>
and <tt>wrmsr</tt> instructions.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.20"></a><h3>
2.1.20&nbsp;&nbsp;SSE4 instructions</h3>

<div class="p"><!----></div>
................................................................................
<div class="p"><!----></div>
The SSE4.1 instructions mostly follow the same rules for operands, as
the basic SSE operations, so they require destination operand to be SSE
register and source operand to be 128-bit memory location or SSE register,
and some operations require a third operand, the 8-bit immediate value.

<div class="p"><!----></div>

<a class="a" id="pmulld"></a>
<a class="a" id="pmuldq"></a>
<a class="a" id="pminsb"></a>
<a class="a" id="pmaxsb"></a>
<a class="a" id="pminuw"></a>
<a class="a" id="pmaxuw"></a>
<a class="a" id="pminud"></a>
<a class="a" id="pmaxud"></a>
<a class="a" id="pminsd"></a>
<a class="a" id="pmaxsd"></a>
<tt>pmulld</tt> performs a signed multiplication of the packed double words and
stores the low double words of the results in the destination operand.
<tt>pmuldq</tt> performs a two signed multiplications of the corresponding double
words in the lower quad words of operands, and stores the results as
packed quad words into the destination register. <tt>pminsb</tt> and <tt>pmaxsb</tt>
return the minimum or maximum values of packed signed bytes, <tt>pminuw</tt> and
<tt>pmaxuw</tt> return the minimum and maximum values of packed unsigned words,
<tt>pminud</tt>, <tt>pmaxud</tt>, <tt>pminsd</tt> and <tt>pmaxsd</tt> return minimum or maximum values
of packed unsigned or signed words. These instructions complement the
instructions computing packed minimum or maximum introduced by SSE.

<div class="p"><!----></div>
<a class="a" id="ptest"></a>
<a class="a" id="pcmpeqq"></a>
<tt>ptest</tt> sets the ZF flag to one when the result of bitwise AND of the
both operands is zero, and zeroes the ZF otherwise. It also sets CF flag
to one, when the result of bitwise AND of the destination operand with
the bitwise NOT of the source operand is zero, and zeroes the CF otherwise.
<tt>pcmpeqq</tt> compares packed quad words for equality, and fills the
corresponding elements of destination operand with either ones or zeros,
depending on the result of comparison.

<div class="p"><!----></div>
<a class="a" id="packusdw"></a>
<tt>packusdw</tt> converts packed signed double words from both the source and
destination operand into the unsigned words using saturation, and stores
the eight resulting word values into the destination register.

<div class="p"><!----></div>
<a class="a" id="phminposuw"></a>
<tt>phminposuw</tt> finds the minimum unsigned word value in source operand
and places it into the lowest word of destination operand, setting the
remaining upper bits of destination to zero.

<div class="p"><!----></div>

<a class="a" id="roundps"></a>
<a class="a" id="roundss"></a>
<a class="a" id="roundpd"></a>
<a class="a" id="roundsd"></a>
<tt>roundps</tt>, <tt>roundss</tt>, <tt>roundpd</tt> and <tt>roundsd</tt> perform the rounding of
packed or individual floating point value of single or double precision,
using the rounding mode specified by the third operand.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;roundsd&nbsp;xmm0,xmm1,0011b&nbsp;;&nbsp;round&nbsp;toward&nbsp;zero

</pre>

<div class="p"><!----></div>
<a class="a" id="dpps"></a>
<a class="a" id="dppd"></a>

<a class="a" id="mpsadbw"></a>
<a class="a" id="roundps"></a>
<tt>dpps</tt> calculates dot product of packed single precision floating point
values, that is it multiplies the corresponding pairs of values from source and
destination operand and then sums the products up. The high four bits of the
8-bit immediate third operand control which products are calculated and taken
to the sum, and the low four bits control, into which elements of destination
the resulting dot product is copied (the other elements are filled with zero).
<tt>dppd</tt> calculates dot product of packed double precision floating point values.
The bits 4 and 5 of third operand control, which products are calculated and
................................................................................
at the position one byte after the position of previous block. The four bytes
from the source stay the same each time. This way eight sums of absolute
differencies are calculated and stored as packed word values into the
destination operand. The instructions described in this paragraph follow the
same rules for operands, as <tt>roundps</tt> instruction.

<div class="p"><!----></div>

<a class="a" id="blendps"></a>
<a class="a" id="blendvps"></a>
<a class="a" id="blendpd"></a>
<a class="a" id="blendvpd"></a>
<tt>blendps</tt>, <tt>blendvps</tt>, <tt>blendpd</tt> and <tt>blendvpd</tt> conditionally copy the
values from source operand into the destination operand, depending on the bits
of the mask provided by third operand. If a mask bit is set, the corresponding
element of source is copied into the same place in destination, otherwise this
position is destination is left unchanged. The rules for the first two operands
are the same, as for general SSE instructions. <tt>blendps</tt> and <tt>blendpd</tt> need
third operand to be 8-bit immediate, and they operate on single or double
precision values, respectively. <tt>blendvps</tt> and <tt>blendvpd</tt> require third operand
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;blendvps&nbsp;xmm3,xmm7,xmm0&nbsp;;&nbsp;blend&nbsp;according&nbsp;to&nbsp;mask

</pre>

<div class="p"><!----></div>
<a class="a" id="pblendw"></a>
<a class="a" id="pblendvb"></a>
<tt>pblendw</tt> conditionally copies word elements from the source operand into the
destination, depending on the bits of mask provided by third operand, which
needs to be 8-bit immediate value. <tt>pblendvb</tt> conditionally copies byte
elements from the source operands into destination, depending on mask defined
by the third operand, which has to be XMM0 register. These instructions follow
the same rules for operands as <tt>blendps</tt> and <tt>blendvps</tt> instructions,
respectively.

<div class="p"><!----></div>
<a class="a" id="insertps"></a>
<tt>insertps</tt> inserts a single precision floating point value taken from the
position in source operand specified by bits 6-7 of third operand into location
in destination register selected by bits 4-5 of third operand. Additionally,
the low four bits of third operand control, which elements in destination
register will be set to zero. The first two operands follow the same rules as
for the general SSE operation, the third operand should be 8-bit immediate.

<div class="p"><!----></div>
<a class="a" id="extractps"></a>
<tt>extractps</tt> extracts a single precision floating point value taken from the
location in source operand specified by low two bits of third operand, and
stores it into the destination operand. The destination can be a 32-bit memory
value or general purpose register, the source operand must be SSE register,
and the third operand should be 8-bit immediate value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;extractps&nbsp;edx,xmm3,3&nbsp;;&nbsp;extract&nbsp;the&nbsp;highest&nbsp;value

</pre>

<div class="p"><!----></div>
<a class="a" id="pinsrb"></a>
<a class="a" id="pinsrd"></a>
<a class="a" id="pinsrq"></a>
<tt>pinsrb</tt>, <tt>pinsrd</tt> and <tt>pinsrq</tt> copy a byte, double word or quad word from
the source operand into the location of destination operand determined by the
third operand. The destination operand has to be SSE register, the source
operand can be a memory location of appropriate size, or the 32-bit general
purpose register (but 64-bit general purpose register for <tt>pinsrq</tt>, which is
only available in long mode), and the third operand has to be 8-bit immediate
value. These instructions complement the <tt>pinsrw</tt> instruction operating on SSE
register destination, which was introduced by SSE2.
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;pinsrd&nbsp;xmm4,eax,1&nbsp;;&nbsp;insert&nbsp;double&nbsp;word&nbsp;into&nbsp;second&nbsp;position

</pre>

<div class="p"><!----></div>
<a class="a" id="pextrb"></a>
<a class="a" id="pextrw"></a>
<a class="a" id="pextrd"></a>
<a class="a" id="pextrq"></a>
<tt>pextrb</tt>, <tt>pextrw</tt>, <tt>pextrd</tt> and <tt>pextrq</tt> copy a byte, word, double word or
quad word from the location in source operand specified by third operand, into
the destination. The source operand should be SSE register, the third operand
should be 8-bit immediate, and the destination operand can be memory location
of appropriate size, or the 32-bit general purpose register (but 64-bit general
purpose register for <tt>pextrq</tt>, which is only available in long mode). The
<tt>pextrw</tt> instruction with SSE register as source was already introduced by
SSE2, but SSE4 extends it to allow memory operand as destination.
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;pextrw&nbsp;[ebx],xmm3,7&nbsp;;&nbsp;extract&nbsp;highest&nbsp;word&nbsp;into&nbsp;memory

</pre>

<div class="p"><!----></div>

<a class="a" id="pmovsxbw"></a>
<a class="a" id="pmovzxbw"></a>
<a class="a" id="pmovsxbd"></a>
<a class="a" id="pmovzxbd"></a>
<a class="a" id="pmovsxbq"></a>
<a class="a" id="pmovzxbq"></a>
<a class="a" id="pmovsxwd"></a>
<a class="a" id="pmovzxwd"></a>
<a class="a" id="pmovsxwq"></a>
<a class="a" id="pmovzxwq"></a>
<a class="a" id="pmovsxdq"></a>
<a class="a" id="pmovzxdq"></a>
<tt>pmovsxbw</tt> and <tt>pmovzxbw</tt> perform sign extension or zero extension of eight
byte values from the source operand into packed word values in destination
operand, which has to be SSE register. The source can be 64-bit memory or SSE
register - when it is register, only its low portion is used. <tt>pmovsxbd</tt> and
<tt>pmovzxbd</tt> perform sign extension or zero extension of the four byte values
from the source operand into packed double word values in destination operand,
the source can be 32-bit memory or SSE register. <tt>pmovsxbq</tt> and <tt>pmovzxbq</tt>
perform sign extension or zero extension of the two byte values from the
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;pmovzxbq&nbsp;xmm0,word&nbsp;[si]&nbsp;&nbsp;;&nbsp;zero-extend&nbsp;bytes&nbsp;to&nbsp;quad&nbsp;words
&nbsp;&nbsp;&nbsp;&nbsp;pmovsxwq&nbsp;xmm0,xmm1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;sign-extend&nbsp;words&nbsp;to&nbsp;quad&nbsp;words

</pre>

<div class="p"><!----></div>
<a class="a" id="movntdqa"></a>
<tt>movntdqa</tt> loads double quad word from the source operand to the destination
using a non-temporal hint. The destination operand should be SSE register,
and the source operand should be 128-bit memory location.

<div class="p"><!----></div>
The SSE4.2, described below, adds not only some new operations on SSE
registers, but also introduces some completely new instructions operating on
general purpose registers only.

<div class="p"><!----></div>

<a class="a" id="pcmpistri"></a>
<a class="a" id="pcmpistrm"></a>
<a class="a" id="pcmpestri"></a>
<a class="a" id="pcmpestrm"></a>
<tt>pcmpistri</tt> compares two zero-ended (implicit length) strings provided in
its source and destination operand and generates an index stored to ECX;
<tt>pcmpistrm</tt> performs the same comparison and generates a mask stored to XMM0.
<tt>pcmpestri</tt> compares two strings of explicit lengths, with length provided
in EAX for the destination operand and in EDX for the source operand, and
generates an index stored to ECX; <tt>pcmpestrm</tt> performs the same comparision
and generates a mask stored to XMM0. The source and destination operand follow
the same rules as for general SSE instructions, the third operand should be
8-bit immediate value determining the details of performed operation - refer to
Intel documentation for information on those details.

<div class="p"><!----></div>
<a class="a" id="pcmpgtq"></a>
<tt>pcmpgtq</tt> compares packed quad words, and fills the corresponding elements of
destination operand with either ones or zeros, depending on whether the value
in destination is greater than the one in source, or not. This instruction
follows the same rules for operands as <tt>pcmpeqq</tt>.

<div class="p"><!----></div>
<a class="a" id="crc32"></a>
<tt>crc32</tt> accumulates a CRC32 value for the source operand starting with
initial value provided by destination operand, and stores the result in
destination. Unless in long mode, the destination operand should be a 32-bit
general purpose register, and the source operand can be a byte, word, or double
word register or memory location. In long mode the destination operand can
also be a 64-bit general purpose register, and the source operand in such case
can be a byte or quad word register or memory location.

................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;crc32&nbsp;eax,dl&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;accumulate&nbsp;CRC32&nbsp;on&nbsp;byte&nbsp;value
&nbsp;&nbsp;&nbsp;&nbsp;crc32&nbsp;eax,word&nbsp;[ebx]&nbsp;&nbsp;;&nbsp;accumulate&nbsp;CRC32&nbsp;on&nbsp;word&nbsp;value
&nbsp;&nbsp;&nbsp;&nbsp;crc32&nbsp;rax,qword&nbsp;[rbx]&nbsp;;&nbsp;accumulate&nbsp;CRC32&nbsp;on&nbsp;quad&nbsp;word&nbsp;value

</pre>

<div class="p"><!----></div>
<a class="a" id="popcnt"></a>
<tt>popcnt</tt> calculates the number of bits set in the source operand, which can
be 16-bit, 32-bit, or 64-bit general purpose register or memory location,
and stores this count in the destination operand, which has to be register of
the same size as source operand. The 64-bit variant is available only in long
mode.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;popcnt&nbsp;ecx,eax&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;count&nbsp;bits&nbsp;set&nbsp;to&nbsp;1

</pre>

<div class="p"><!----></div>
<a class="a" id="lzcnt"></a>
The SSE4a extension, which also includes the <tt>popcnt</tt> instruction introduced
by SSE4.2, at the same time adds the <tt>lzcnt</tt> instruction, which follows the
same syntax, and calculates the count of leading zero bits in source operand
(if the source operand is all zero bits, the total number of bits in source
operand is stored in destination).

<div class="p"><!----></div>
<a class="a" id="extrq"></a>
<tt>extrq</tt> extract the sequence of bits from the low quad word of SSE register
provided as first operand and stores them at the low end of this register,
filling the remaining bits in the low quad word with zeros. The position of bit
string and its length can either be provided with two 8-bit immediate values
as second and third operand, or by SSE register as second operand (and there
is no third operand in such case), which should contain position value in bits
8-13 and length of bit string in bits 0-5.

................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;extrq&nbsp;xmm0,8,7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;extract&nbsp;8&nbsp;bits&nbsp;from&nbsp;position&nbsp;7
&nbsp;&nbsp;&nbsp;&nbsp;extrq&nbsp;xmm0,xmm5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;extract&nbsp;bits&nbsp;defined&nbsp;by&nbsp;register

</pre>

<div class="p"><!----></div>
<a class="a" id="insertq"></a>
<tt>insertq</tt> writes the sequence of bits from the low quad word of the source
operand into specified position in low quad word of the destination operand,
leaving the other bits in low quad word of destination intact. The position
where bits should be written and the length of bit string can either be
provided with two 8-bit immediate values as third and fourth operand, or by
the bit fields in source operand (and there are only two operands in such
case), which should contain position value in bits 72-77 and length of bit
string in bits 64-69.
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;insertq&nbsp;xmm1,xmm0,4,2&nbsp;;&nbsp;insert&nbsp;4&nbsp;bits&nbsp;at&nbsp;position&nbsp;2
&nbsp;&nbsp;&nbsp;&nbsp;insertq&nbsp;xmm1,xmm0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;insert&nbsp;bits&nbsp;defined&nbsp;by&nbsp;register

</pre>

<div class="p"><!----></div>
<a class="a" id="movntss"></a>
<a class="a" id="movntsd"></a>
<tt>movntss</tt> and <tt>movntsd</tt> store single or double precision floating point
value from the source SSE register into 32-bit or 64-bit destination memory
location respectively, using non-temporal hint.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.21"></a><h3>
2.1.21&nbsp;&nbsp;AVX instructions</h3>

................................................................................
variant has a new syntax with three operands - the destination and two sources.
The destination and first source can be SSE registers, and second source can be
SSE register or memory. If the operation is performed on single pair of values,
the remaining bits of first source SSE register are copied into the the
destination register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vsubss&nbsp;xmm0,xmm2,xmm3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;subtract&nbsp;two&nbsp;32-bit&nbsp;floats
&nbsp;&nbsp;&nbsp;&nbsp;vmulsd&nbsp;xmm0,xmm7,qword&nbsp;[esi]&nbsp;&nbsp;;&nbsp;multiply&nbsp;two&nbsp;64-bit&nbsp;floats

</pre>
In case of packed operations, each instruction can also operate on the 256-bit
data size when the AVX registers are specified instead of SSE registers, and
the size of memory operand is also doubled then.

................................................................................

</pre>
The promotion to new syntax according to the rules described above has been
applied to all the instructions from SSE extensions up to SSE4, with the
exceptions described below.

<div class="p"><!----></div>
<a class="a" id="vdppd"></a>
<tt>vdppd</tt> instruction has syntax extended to four operans, but it does not
have a 256-bit version.

<div class="p"><!----></div>

<a class="a" id="vsqrtpd"></a>
<a class="a" id="vsqrtps"></a>
<a class="a" id="vrcpps"></a>
<a class="a" id="vrsqrtps"></a>
The are a few instructions, namely <tt>vsqrtpd</tt>, <tt>vsqrtps</tt>, <tt>vrcpps</tt> and
<tt>vrsqrtps</tt>, which can operate on 256-bit data size, but retained the syntax
with only two operands, because they use data from only one source:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vsqrtpd&nbsp;ymm1,ymm0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;put&nbsp;square&nbsp;roots&nbsp;into&nbsp;other&nbsp;register

</pre>

<div class="p"><!----></div>
<a class="a" id="vroundpd"></a>
<a class="a" id="vroundps"></a>
In a similar way <tt>vroundpd</tt> and <tt>vroundps</tt> retained the syntax with three
operands, the last one being immediate value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vroundps&nbsp;ymm0,ymm1,0011b&nbsp;&nbsp;;&nbsp;round&nbsp;toward&nbsp;zero

</pre>

<div class="p"><!----></div>

<a class="a" id="vpcmpestri"></a>
<a class="a" id="vpcmpestrm"></a>
<a class="a" id="vpcmpistri"></a>
<a class="a" id="vpcmpistrm"></a>
<a class="a" id="vphminposuw"></a>
<a class="a" id="vpshufd"></a>
<a class="a" id="vpshufhw"></a>
<a class="a" id="vpshuflw"></a>
<a class="a" id="vcomiss"></a>
<a class="a" id="vcomisd"></a>
<a class="a" id="vcvtss2si"></a>
<a class="a" id="vcvtsd2si"></a>
<a class="a" id="vcvttss2si"></a>
<a class="a" id="vcvttsd2si"></a>
<a class="a" id="vextractps"></a>
<a class="a" id="vpextrb"></a>
<a class="a" id="vpextrw"></a>
<a class="a" id="vpextrd"></a>
<a class="a" id="vpextrq"></a>
<a class="a" id="vmovd"></a>
<a class="a" id="vmovq"></a>

<a class="a" id="vmovntdqa"></a>
<a class="a" id="vmaskmovdqu"></a>
<a class="a" id="vpmovmskb"></a>
<a class="a" id="vpmovsxbw"></a>
<a class="a" id="vpmovsxbd"></a>
<a class="a" id="vpmovsxbq"></a>
<a class="a" id="vpmovsxwd"></a>
<a class="a" id="vpmovsxwq"></a>
<a class="a" id="vpmovsxdq"></a>
<a class="a" id="vpmovzxbw"></a>
<a class="a" id="vpmovzxbd"></a>
<a class="a" id="vpmovzxbq"></a>
<a class="a" id="vpmovzxwd"></a>
<a class="a" id="vpmovzxwq"></a>
<a class="a" id="vpmovzxdq"></a>
Also some of the operations on packed integers kept their two-operand or
three-operand syntax while being promoted to AVX version. In such case these
instructions follow exactly the same rules for operands as their SSE
counterparts (since operations on packed integers do not have 256-bit variants
in AVX extension). These include <tt>vpcmpestri</tt>, <tt>vpcmpestrm</tt>, <tt>vpcmpistri</tt>,
<tt>vpcmpistrm</tt>, <tt>vphminposuw</tt>, <tt>vpshufd</tt>, <tt>vpshufhw</tt>, <tt>vpshuflw</tt>. And there are
more instructions that in AVX versions keep exactly the same syntax for
operands as the one from SSE, without any additional options: <tt>vcomiss</tt>,
................................................................................
<tt>vcomisd</tt>, <tt>vcvtss2si</tt>, <tt>vcvtsd2si</tt>, <tt>vcvttss2si</tt>, <tt>vcvttsd2si</tt>, <tt>vextractps</tt>,
<tt>vpextrb</tt>, <tt>vpextrw</tt>, <tt>vpextrd</tt>, <tt>vpextrq</tt>, <tt>vmovd</tt>, <tt>vmovq</tt>, <tt>vmovntdqa</tt>,
<tt>vmaskmovdqu</tt>, <tt>vpmovmskb</tt>, <tt>vpmovsxbw</tt>, <tt>vpmovsxbd</tt>, <tt>vpmovsxbq</tt>, <tt>vpmovsxwd</tt>,
<tt>vpmovsxwq</tt>, <tt>vpmovsxdq</tt>, <tt>vpmovzxbw</tt>, <tt>vpmovzxbd</tt>, <tt>vpmovzxbq</tt>, <tt>vpmovzxwd</tt>,
<tt>vpmovzxwq</tt> and <tt>vpmovzxdq</tt>.

<div class="p"><!----></div>

<a class="a" id="vcvtdq2ps"></a>
<a class="a" id="vcvtps2dq"></a>
<a class="a" id="vcvttps2dq"></a>
<a class="a" id="vmovaps"></a>
<a class="a" id="vmovapd"></a>
<a class="a" id="vmovups"></a>
<a class="a" id="vmovupd"></a>
<a class="a" id="vmovdqa"></a>
<a class="a" id="vmovdqu"></a>
<a class="a" id="vlddqu"></a>
<a class="a" id="vmovntps"></a>
<a class="a" id="vmovntpd"></a>
<a class="a" id="vmovntdq"></a>
<a class="a" id="vmovsldup"></a>
<a class="a" id="vmovshdup"></a>
<a class="a" id="vmovmskps"></a>
<a class="a" id="vmovmskpd"></a>
The move and conversion instructions have mostly been promoted to allow
256-bit size operands in addition to the 128-bit variant with syntax identical
to that from SSE version of the same instruction. Each of the
<tt>vcvtdq2ps</tt>, <tt>vcvtps2dq</tt> and <tt>vcvttps2dq</tt>,
<tt>vmovaps</tt>, <tt>vmovapd</tt>, <tt>vmovups</tt>, <tt>vmovupd</tt>,
<tt>vmovdqa</tt>, <tt>vmovdqu</tt>, <tt>vlddqu</tt>,
<tt>vmovntps</tt>, <tt>vmovntpd</tt>, <tt>vmovntdq</tt>,
<tt>vmovsldup</tt>, <tt>vmovshdup</tt>,
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vmovups&nbsp;[edi],ymm6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;store&nbsp;unaligned&nbsp;256-bit&nbsp;data

</pre>

<div class="p"><!----></div>
<a class="a" id="vmovddup"></a>
<tt>vmovddup</tt> has the identical 128-bit syntax as its SSE version, and it also
has a 256-bit version, which stores the duplicates of the lowest quad word
from the source operand in the lower half of destination operand, and in the
upper half of destination the duplicates of the low quad word from the upper
half of source. Both source and destination operands need then to be 256-bit
values.

<div class="p"><!----></div>
<a class="a" id="vmovlhps"></a>
<a class="a" id="vmovhlps"></a>
<tt>vmovlhps</tt> and <tt>vmovhlps</tt> have only 128-bit versions, and each takes three
operands, which all must be SSE registers. <tt>vmovlhps</tt> copies two single
precision values from the low quad word of second source register to the high
quad word of destination register, and copies the low quad word of first
source register into the low quad word of destination register. <tt>vmovhlps</tt>
copies two single  precision values from the high quad word of second source
register to the low quad word of destination register, and copies the high
quad word of first source register into the high quad word of destination
register.

<div class="p"><!----></div>

<a class="a" id="vmovlps"></a>
<a class="a" id="vmovhps"></a>
<a class="a" id="vmovlpd"></a>
<a class="a" id="vmovhpd"></a>
<tt>vmovlps</tt>, <tt>vmovhps</tt>, <tt>vmovlpd</tt> and <tt>vmovhpd</tt> have only 128-bit versions and
their syntax varies depending on whether memory operand is a destination or
source. When memory is destination, the syntax is identical to the one of
equivalent SSE instruction, and when memory is source, the instruction requires
three operands, first two being SSE registers and the third one 64-bit memory.
The value put into destination is then the value copied from first source with
either low or high quad word replaced with value from second source (the
memory operand).
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vmovhps&nbsp;[esi],xmm7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;store&nbsp;upper&nbsp;half&nbsp;to&nbsp;memory
&nbsp;&nbsp;&nbsp;&nbsp;vmovlps&nbsp;xmm0,xmm7,[ebx]&nbsp;&nbsp;;&nbsp;low&nbsp;from&nbsp;memory,&nbsp;rest&nbsp;from&nbsp;register

</pre>

<div class="p"><!----></div>
<a class="a" id="vmovss"></a>
<a class="a" id="vmovsd"></a>
<tt>vmovss</tt> and <tt>vmovsd</tt> have syntax identical to their SSE equivalents as long
as one of the operands is memory, while the versions that operate purely on
registers require three operands (each being SSE register). The value stored
in destination is then the value copied from first source with lowest data
element replaced with the lowest value from second source.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vmovss&nbsp;xmm3,[edi]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;low&nbsp;from&nbsp;memory,&nbsp;rest&nbsp;zeroed
&nbsp;&nbsp;&nbsp;&nbsp;vmovss&nbsp;xmm0,xmm1,xmm2&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;one&nbsp;value&nbsp;from&nbsp;xmm2,&nbsp;three&nbsp;from&nbsp;xmm1

</pre>

<div class="p"><!----></div>

<a class="a" id="vcvtss2sd"></a>
<a class="a" id="vcvtsd2ss"></a>
<a class="a" id="vcvtsi2ss"></a>
<a class="a" id="vcvtsi2d"></a>
<tt>vcvtss2sd</tt>, <tt>vcvtsd2ss</tt>, <tt>vcvtsi2ss</tt> and <tt>vcvtsi2d</tt> use the three-operand
syntax, where destination and first source are always SSE registers, and the
second source follows the same rules and the source in syntax of equivalent
SSE instruction. The value stored in destination is then the value copied from
first source with lowest data element replaced with the result of conversion.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vcvtsi2sd&nbsp;xmm4,xmm4,ecx&nbsp;&nbsp;;&nbsp;32-bit&nbsp;integer&nbsp;to&nbsp;64-bit&nbsp;float
&nbsp;&nbsp;&nbsp;&nbsp;vcvtsi2ss&nbsp;xmm0,xmm0,rax&nbsp;&nbsp;;&nbsp;64-bit&nbsp;integer&nbsp;to&nbsp;32-bit&nbsp;float

</pre>

<div class="p"><!----></div>

<a class="a" id="vcvtdq2pd"></a>
<a class="a" id="vcvtps2pd"></a>
<a class="a" id="vcvtpd2dq"></a>
<a class="a" id="vcvttpd2dq"></a>
<a class="a" id="vcvtpd2ps"></a>
<tt>vcvtdq2pd</tt> and <tt>vcvtps2pd</tt> allow the same syntax as their SSE equivalents,
plus the new variants with AVX register as destination and SSE register or
128-bit memory as source. Analogously <tt>vcvtpd2dq</tt>, <tt>vcvttpd2dq</tt> and
<tt>vcvtpd2ps</tt>, in addition to variant with syntax identical to SSE version,
allow a variant with SSE register as destination and AVX register or 256-bit
memory as source.

<div class="p"><!----></div>

<a class="a" id="vinsertps"></a>
<a class="a" id="vpinsrb"></a>
<a class="a" id="vpinsrw"></a>
<a class="a" id="vpinsrd"></a>
<a class="a" id="vpinsrq"></a>
<a class="a" id="vpblendw"></a>
<tt>vinsertps</tt>, <tt>vpinsrb</tt>, <tt>vpinsrw</tt>, <tt>vpinsrd</tt>, <tt>vpinsrq</tt> and <tt>vpblendw</tt> use
a syntax with four operands, where destination and first source have to be SSE
registers, and the third and fourth operand follow the same rules as second
and third operand in the syntax of equivalent SSE instruction. Value stored in
destination is the the value copied from first source with some data elements
replaced with values extracted from the second source, analogously to the
operation of corresponding SSE instruction.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpinsrd&nbsp;xmm0,xmm0,eax,3&nbsp;&nbsp;;&nbsp;insert&nbsp;double&nbsp;word

</pre>

<div class="p"><!----></div>

<a class="a" id="vblendvps"></a>
<a class="a" id="vblendvpd"></a>
<a class="a" id="vpblendvb"></a>
<tt>vblendvps</tt>, <tt>vblendvpd</tt> and <tt>vpblendvb</tt> use a new syntax with four register
operands: destination, two sources and a mask, where second source can also be
a memory operand. <tt>vblendvps</tt> and <tt>vblendvpd</tt> have 256-bit variant, where
operands are AVX registers or 256-bit memory, as well as 128-bit variant,
which has operands being SSE registers or 128-bit memory. <tt>vpblendvb</tt> has only
a 128-bit variant. Value stored in destination is the value copied from the
first source with some data elements replaced, according to mask, by values
from the second source.
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vblendvps&nbsp;ymm3,ymm1,ymm2,ymm7&nbsp;&nbsp;;&nbsp;blend&nbsp;according&nbsp;to&nbsp;mask

</pre>

<div class="p"><!----></div>

<a class="a" id="vptest"></a>
<a class="a" id="vtestps"></a>
<a class="a" id="vtestpd"></a>
<tt>vptest</tt> allows the same syntax as its SSE version and also has a 256-bit
version, with both operands doubled in size. There are also two new
instructions, <tt>vtestps</tt> and <tt>vtestpd</tt>, which perform analogous tests, but only
of the sign bits of corresponding single precision or double precision values,
and set the ZF and CF accordingly. They follow the same syntax rules as
<tt>vptest</tt>.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vptest&nbsp;ymm0,yword&nbsp;[ebx]&nbsp;&nbsp;;&nbsp;test&nbsp;256-bit&nbsp;values
&nbsp;&nbsp;&nbsp;&nbsp;vtestpd&nbsp;xmm0,xmm1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;test&nbsp;sign&nbsp;bits&nbsp;of&nbsp;64-bit&nbsp;floats

</pre>

<div class="p"><!----></div>
<a class="a" id="vbroadcastss"></a>
<a class="a" id="vbroadcastsd"></a>
<a class="a" id="vbroadcastf128"></a>
<tt>vbroadcastss</tt>, <tt>vbroadcastsd</tt> and <tt>vbroadcastf128</tt> are new instructions,
which broadcast the data element defined by source operand into all elements
of corresponing size in the destination register. <tt>vbroadcastss</tt> needs
source to be 32-bit memory and destination to be either SSE or AVX register.
<tt>vbroadcastsd</tt> requires 64-bit memory as source, and AVX register as
destination. <tt>vbroadcastf128</tt> requires 128-bit memory as source, and AVX
register as destination.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vbroadcastss&nbsp;ymm0,dword&nbsp;[eax]&nbsp;&nbsp;;&nbsp;get&nbsp;eight&nbsp;copies&nbsp;of&nbsp;value

</pre>

<div class="p"><!----></div>
<a class="a" id="vinsertf128"></a>
<tt>vinsertf128</tt> is the new instruction, which takes four operands. The
destination and first source have to be AVX registers, second source can be
SSE register or 128-bit memory location, and fourth operand should be an
immediate value. It stores in destination the value obtained by taking
contents of first source and replacing one of its 128-bit units with value of
the second source. The lowest bit of fourth operand specifies at which
position that replacement is done (either 0 or 1).

<div class="p"><!----></div>
<a class="a" id="vextractf128"></a>
<tt>vextractf128</tt> is the new instruction with three operands. The destination
needs to be SSE register or 128-bit memory location, the source must be AVX
register, and the third operand should be an immediate value. It extracts
into destination one of the 128-bit units from source. The lowest bit of third
operand specifies, which unit is extracted.

<div class="p"><!----></div>
<a class="a" id="vmaskmovps"></a>
<a class="a" id="vmaskmovpd"></a>
<tt>vmaskmovps</tt> and <tt>vmaskmovpd</tt> are the new instructions with three operands
that selectively store in destination the elements from second source
depending on the sign bits of corresponding elements from first source. These
instructions can operate on either 128-bit data (SSE registers) or 256-bit
data (AVX registers). Either destination or second source has to be a memory
location of appropriate size, the two other operands should be registers.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vmaskmovps&nbsp;[edi],xmm0,xmm5&nbsp;&nbsp;;&nbsp;conditionally&nbsp;store
&nbsp;&nbsp;&nbsp;&nbsp;vmaskmovpd&nbsp;ymm5,ymm0,[esi]&nbsp;&nbsp;;&nbsp;conditionally&nbsp;load

</pre>

<div class="p"><!----></div>
<a class="a" id="vpermilpd"></a>
<a class="a" id="vpermilps"></a>
<tt>vpermilpd</tt> and <tt>vpermilps</tt> are the new instructions with three operands
that permute the values from first source according to the control fields from
second source and put the result into destination operand. It allows to use
either three SSE registers or three AVX registers as its operands, the second
source can be a memory of size equal to the registers used. In alternative
form the second source can be immediate value and then the first source
can be a memory location of the size equal to destination register.

<div class="p"><!----></div>
<a class="a" id="vperm2f128"></a>
<tt>vperm2f128</tt> is the new instruction with four operands, which selects
128-bit blocks of floating point data from first and second source according
to the bit fields from fourth operand, and stores them in destination.
Destination and first source need to be AVX registers, second source can be
AVX register or 256-bit memory area, and fourth operand should be an immediate
value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vperm2f128&nbsp;ymm0,ymm6,ymm7,12h&nbsp;&nbsp;;&nbsp;permute&nbsp;128-bit&nbsp;blocks

</pre>

<div class="p"><!----></div>
<a class="a" id="vzeroall"></a>
<a class="a" id="vzeroupper"></a>
<tt>vzeroall</tt> instruction sets all the AVX registers to zero. <tt>vzeroupper</tt> sets
the upper 128-bit portions of all AVX registers to zero, leaving the SSE
registers intact. These new instructions take no operands.

<div class="p"><!----></div>
<a class="a" id="vldmxcsr"></a>
<a class="a" id="vstmxcsr"></a>
<tt>vldmxcsr</tt> and <tt>vstmxcsr</tt> are the AVX versions of <tt>ldmxcsr</tt> and <tt>stmxcsr</tt>
instructions. The rules for their operands remain unchanged.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.22"></a><h3>
2.1.22&nbsp;&nbsp;AVX2 instructions</h3>

<div class="p"><!----></div>
................................................................................
<div class="p"><!----></div>
The AVX instructions that operate on packed integers and had only a 128-bit
variants, have been supplemented with 256-bit variants, and thus their syntax
rules became analogous to AVX instructions operating on packed floating point
types.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpsubb&nbsp;ymm0,ymm0,[esi]&nbsp;&nbsp;&nbsp;;&nbsp;subtract&nbsp;32&nbsp;packed&nbsp;bytes
&nbsp;&nbsp;&nbsp;&nbsp;vpavgw&nbsp;ymm3,ymm0,ymm2&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;average&nbsp;of&nbsp;16-bit&nbsp;integers

</pre>
However there are some instructions that have not been equipped with the
256-bit variants. <tt>vpcmpestri</tt>, <tt>vpcmpestrm</tt>, <tt>vpcmpistri</tt>, <tt>vpcmpistrm</tt>,
<tt>vpextrb</tt>, <tt>vpextrw</tt>, <tt>vpextrd</tt>, <tt>vpextrq</tt>, <tt>vpinsrb</tt>, <tt>vpinsrw</tt>, <tt>vpinsrd</tt>,
<tt>vpinsrq</tt> and <tt>vphminposuw</tt> are not affected by AVX2 and allow only the
................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpsllw&nbsp;ymm2,ymm2,xmm4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;shift&nbsp;words&nbsp;left
&nbsp;&nbsp;&nbsp;&nbsp;vpsrad&nbsp;ymm0,ymm3,xword&nbsp;[ebx]&nbsp;;&nbsp;shift&nbsp;double&nbsp;words&nbsp;right

</pre>

<div class="p"><!----></div>

<a class="a" id="vpsllvd"></a>
<a class="a" id="vpsllvq"></a>
<a class="a" id="vpsrlvd"></a>
<a class="a" id="vpsrlvq"></a>
<a class="a" id="vpsravd"></a>
There are also new packed shift instructions with standard three-operand AVX
syntax, which shift each element from first source by the amount specified in
corresponding element of second source, and store the results in destination.
<tt>vpsllvd</tt> shifts 32-bit elements left, <tt>vpsllvq</tt> shifts 64-bit elements left,
<tt>vpsrlvd</tt> shifts 32-bit elements right logically, <tt>vpsrlvq</tt> shifts 64-bit
elements right logically and <tt>vpsravd</tt> shifts 32-bit elements right
arithmetically.

................................................................................

<div class="p"><!----></div>
Also <tt>vmovntdqa</tt> has been upgraded with 256-bit variant, so it allows to
transfer 256-bit value from memory to AVX register, it needs memory address
to be aligned to 32 bytes.

<div class="p"><!----></div>
<a class="a" id="vpmaskmovd"></a>
<a class="a" id="vpmaskmovq"></a>
<tt>vpmaskmovd</tt> and <tt>vpmaskmovq</tt> are the new instructions with syntax identical
to <tt>vmaskmovps</tt> or <tt>vmaskmovpd</tt>, and they performs analogous operation on
packed 32-bit or 64-bit values.

<div class="p"><!----></div>

<a class="a" id="vinserti128"></a>
<a class="a" id="vextracti128"></a>
<a class="a" id="vbroadcasti128"></a>
<a class="a" id="vperm2i128"></a>
<tt>vinserti128</tt>, <tt>vextracti128</tt>, <tt>vbroadcasti128</tt> and <tt>vperm2i128</tt> are the new
instructions with syntax identical to <tt>vinsertf128</tt>, <tt>vextractf128</tt>,
<tt>vbroadcastf128</tt> and <tt>vperm2f128</tt> respectively, and they perform analogous
operations on 128-bit blocks of integer data.

<div class="p"><!----></div>
<tt>vbroadcastss</tt> and <tt>vbroadcastsd</tt> instructions have been extended to allow
SSE register as a source operand (which in AVX could only be a memory).

<div class="p"><!----></div>
<a class="a" id="vpbroadcastb"></a>
<a class="a" id="vpbroadcastw"></a>
<a class="a" id="vpbroadcastd"></a>
<a class="a" id="vpbroadcastq"></a>
<tt>vpbroadcastb</tt>, <tt>vpbroadcastw</tt>, <tt>vpbroadcastd</tt> and <tt>vpbroadcastq</tt> are the
new instructions which broadcast the byte, word, double word or quad word from
the source operand into all elements of corresponing size in the destination
register. The destination operand can be either SSE or AVX register, and the
source operand can be SSE register or memory of size equal to the size of data
element.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpbroadcastb&nbsp;ymm0,byte&nbsp;[ebx]&nbsp;&nbsp;;&nbsp;get&nbsp;32&nbsp;identical&nbsp;bytes

</pre>

<div class="p"><!----></div>
<a class="a" id="vpermd"></a>
<a class="a" id="vpermps"></a>
<tt>vpermd</tt> and <tt>vpermps</tt> are new three-operand instructions, which use each
32-bit element from first source as an index of element in second source which
is copied into destination at position corresponding to element containing
index. The destination and first source have to be AVX registers, and the
second source can be AVX register or 256-bit memory.

<div class="p"><!----></div>
<a class="a" id="vpermq"></a>
<a class="a" id="vpermpd"></a>
<tt>vpermq</tt> and <tt>vpermpd</tt> are new three-operand instructions, which use 2-bit
indexes from the immediate value specified as third operand to determine which
element from source store at given position in destination. The destination
has to be AVX register, source can be AVX register or 256-bit memory, and the
third operand must be 8-bit immediate value.

<div class="p"><!----></div>
The family of new instructions performing <tt>gather</tt> operation have special
................................................................................
destination and mask registers, the higher elements of destination are zeroed.
After the value is successfuly loaded, the corresponding element in mask
register is set to zero. The destination, index and mask should all be
distinct registers, it is not allowed to use the same register in two
different roles.

<div class="p"><!----></div>
<a class="a" id="vgatherdps"></a>
<tt>vgatherdps</tt> loads single precision floating point values addressed by
32-bit indexes. The destination, index and mask should all be registers of the
same type, either SSE or AVX. The data addressed by memory operand is 32-bit
in size.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vgatherdps&nbsp;xmm0,[eax+xmm1],xmm3&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;gather&nbsp;four&nbsp;floats
&nbsp;&nbsp;&nbsp;&nbsp;vgatherdps&nbsp;ymm0,[ebx+ymm7*4],ymm3&nbsp;&nbsp;;&nbsp;gather&nbsp;eight&nbsp;floats

</pre>

<div class="p"><!----></div>
<a class="a" id="vgatherqps"></a>
<tt>vgatherqps</tt> loads single precision floating point values addressed by
64-bit indexes. The destination and mask should always be SSE registers, while
index register can be either SSE or AVX register. The data addressed by memory
operand is 32-bit in size.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vgatherqps&nbsp;xmm0,[xmm2],xmm3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;gather&nbsp;two&nbsp;floats
&nbsp;&nbsp;&nbsp;&nbsp;vgatherqps&nbsp;xmm0,[ymm2+64],xmm3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;gather&nbsp;four&nbsp;floats

</pre>

<div class="p"><!----></div>
<a class="a" id="vgatherdpd"></a>
<tt>vgatherdpd</tt> loads double precision floating point values addressed by
32-bit indexes. The index register should always be SSE register, the
destination and mask should be two registers of the same type, either SSE or
AVX. The data addressed by memory operand is 64-bit in size.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vgatherdpd&nbsp;xmm0,[ebp+xmm1],xmm3&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;gather&nbsp;two&nbsp;doubles
&nbsp;&nbsp;&nbsp;&nbsp;vgatherdpd&nbsp;ymm0,[xmm3*8],ymm5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;gather&nbsp;four&nbsp;doubles

</pre>

<div class="p"><!----></div>
<a class="a" id="vgatherqpd"></a>
<tt>vgatherqpd</tt> loads double precision floating point values addressed by
64-bit indexes. The destination, index and mask should all be registers of the
same type, either SSE or AVX. The data addressed by memory operand is 64-bit
in size.

<div class="p"><!----></div>
<a class="a" id="vpgatherdd"></a>
<a class="a" id="vpgatherqd"></a>
<tt>vpgatherdd</tt> and <tt>vpgatherqd</tt> load 32-bit values addressed by either 32-bit
or 64-bit indexes. They follow the same rules as <tt>vgatherdps</tt> and <tt>vgatherqps</tt>
respectively.

<div class="p"><!----></div>
<a class="a" id="vpgatherdq"></a>
<a class="a" id="vpgatherqq"></a>
<tt>vpgatherdq</tt> and <tt>vpgatherqq</tt> load 64-bit values addressed by either 32-bit
or 64-bit indexes. They follow the same rules as <tt>vgatherdpd</tt> and <tt>vgatherqpd</tt>
respectively.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.23"></a><h3>
2.1.23&nbsp;&nbsp;Auxiliary sets of computational instructions</h3>

................................................................................
The AES extension provides a specialized set of instructions for the
purpose of cryptographic computations defined by Advanced Encryption Standard.
Each of these instructions has two versions: the AVX one and the one with
SSE-like syntax that uses classic encoding. Refer to the Intel manuals for the
details of operation of these instructions.

<div class="p"><!----></div>

<a class="a" id="aesenc"></a>
<a class="a" id="aesenclast"></a>
<a class="a" id="vaesenc"></a>
<a class="a" id="vaesenclast"></a>
<tt>aesenc</tt> and <tt>aesenclast</tt> perform a single round of AES encryption on data
from first source with a round key from second source, and store result in
destination. The destination and first source are SSE registers, and the
second source can be SSE register or 128-bit memory. The AVX versions of these
instructions, <tt>vaesenc</tt> and <tt>vaesenclast</tt>, use the syntax with three operands,
while the SSE-like version has only two operands, with first operand being
both the destination and first source.

<div class="p"><!----></div>
<a class="a" id="aesdec"></a>
<a class="a" id="aesdeclast"></a>
<tt>aesdec</tt> and <tt>aesdeclast</tt> perform a single round of AES decryption on data
from first source with a round key from second source. The syntax rules for
them and their AVX versions are the same as for <tt>aesenc</tt>.

<div class="p"><!----></div>
<a class="a" id="aesimc"></a>
<a class="a" id="vaesimc"></a>
<tt>aesimc</tt> performs the InvMixColumns transformation of source operand and
store the result in destination. Both <tt>aesimc</tt> and <tt>vaesimc</tt> use only two
operands, destination being SSE register, and source being SSE register or
128-bit memory location.

<div class="p"><!----></div>
<a class="a" id="aeskeygenassist"></a>
<tt>aeskeygenassist</tt> is a helper instruction for generating the round key.
It needs three operands: destination being SSE register, source being SSE
register or 128-bit memory, and third operand being 8-bit immediate value.
The AVX version of this instruction uses the same syntax.

<div class="p"><!----></div>
<a class="a" id="pclmulqdq"></a>
<a class="a" id="vpclmulqdq"></a>
The CLMUL extension introduces just one instruction, <tt>pclmulqdq</tt>, and its
AVX version as well. This instruction performs a carryless multiplication of
two 64-bit values selected from first and second source according to the bit
fields in immediate value. The destination and first source are SSE registers,
second source is SSE register or 128-bit memory, and immediate value is
provided as last operand. <tt>vpclmulqdq</tt> takes four operands, while <tt>pclmulqdq</tt>
takes only three operands, with the first one serving both the role of
destination and first source.
................................................................................
The FMA (Fused Multiply-Add) extension introduces additional AVX
instructions which perform multiplication and summation as single operation.
Each one takes three operands, first one serving both the role of destination
and first source, and the following ones being the second and third source.
The mnemonic of FMA instruction is obtained by appending to <tt>vf</tt> prefix: first
either <tt>m</tt> or <tt>nm</tt> to select whether result of multiplication should be taken
as-is or negated, then either <tt>add</tt> or <tt>sub</tt> to select whether third value
will be added to the product or subtracted from the product, then either
<tt>132</tt>, <tt>213</tt> or <tt>231</tt> to select which source operands are multiplied and which
one is added or subtracted, and finally the type of data on which the
instruction operates, either <tt>ps</tt>, <tt>pd</tt>, <tt>ss</tt> or <tt>sd</tt>. As it was with SSE
instructions promoted to AVX, instructions operating on packed floating point
values allow 128-bit or 256-bit syntax, in former all the operands are SSE
registers, but the third one can also be a 128-bit memory, in latter the
operands are AVX registers and the third one can also be a 256-bit memory.
Instructions that compute just one floating point result need operands to be
SSE registers, and the third operand can also be a memory, either 32-bit for
single precision or 64-bit for double precision.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vfmsub231ps&nbsp;ymm1,ymm2,ymm3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;multiply&nbsp;and&nbsp;subtract
&nbsp;&nbsp;&nbsp;&nbsp;vfnmadd132sd&nbsp;xmm0,xmm5,[ebx]&nbsp;&nbsp;&nbsp;;&nbsp;multiply,&nbsp;negate&nbsp;and&nbsp;add

</pre>
In addition to the instructions created by the rule described above, there are
families of instructions with mnemonics starting with either <tt>vfmaddsub</tt> or
<tt>vfmsubadd</tt>, followed by either <tt>132</tt>, <tt>213</tt> or <tt>231</tt> and then either <tt>ps</tt> or
<tt>pd</tt> (the operation must always be on packed values in this case). They add
to the result of multiplication or subtract from it depending on the position
of value in packed data - instructions from the <tt>vfmaddsub</tt> group add when the
position is odd and subtract when the position is even, instructions from the
<tt>vfmsubadd</tt> group add when the position is even and subtstract when the
position is odd. The rules for operands are the same as for other FMA
instructions.

<div class="p"><!----></div>
The FMA4 instructions are similar to FMA, but use syntax with four operands
and thus allow destination to be different than all the sources. Their
mnemonics are identical to FMA instructions with the <tt>132</tt>, <tt>213</tt> or <tt>231</tt> cut
out, as having separate destination operand makes such selection of operands
superfluous. The multiplication is always performed on values from the first
and second source, and then the value from third source is added or
subtracted. Either second or third source can be a memory operand, and the
rules for the sizes of operands are the same as for FMA instructions.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vfmaddpd&nbsp;ymm0,ymm1,[esi],ymm2&nbsp;&nbsp;;&nbsp;multiply&nbsp;and&nbsp;add
&nbsp;&nbsp;&nbsp;&nbsp;vfmsubss&nbsp;xmm0,xmm1,xmm2,[ebx]&nbsp;&nbsp;;&nbsp;multiply&nbsp;and&nbsp;subtract

</pre>

<div class="p"><!----></div>
<a class="a" id="vcvtps2ph"></a>
<a class="a" id="vcvtph2ps"></a>
The F16C extension consists of two instructions, <tt>vcvtps2ph</tt> and
<tt>vcvtph2ps</tt>, which convert floating point values between single precision and
half precision (the 16-bit floating point format). <tt>vcvtps2ph</tt> takes three
operands: destination, source, and rounding controls. The third operand is
always an immediate, the source is either SSE or AVX register containing
single precision values, and the destination is SSE register or memory, the
size of memory is 64 bits when the source is SSE register and 128 bits when
the source is AVX register. <tt>vcvtph2ps</tt> takes two operands, the destination
that can be SSE or AVX register, and the source that is SSE register or memory
with size of the half of destination operand's size.

<div class="p"><!----></div>

<a class="a" id="vfrczps"></a>
<a class="a" id="vfrczss"></a>
<a class="a" id="vfrczpd"></a>
<a class="a" id="vfrczsd"></a>
The AMD XOP extension introduces a number of new vector instructions with
encoding and syntax analogous to AVX instructions. <tt>vfrczps</tt>, <tt>vfrczss</tt>,
<tt>vfrczpd</tt> and <tt>vfrczsd</tt> extract fractional portions of single or double
precision values, they all take two operands. The packed operations allow
either SSE or AVX register as destination, for the other two it has to be SSE
register. Source can be register of the same type as destination, or memory
of appropriate size (256-bit for destination being AVX register, 128-bit for
packed operation with destination being SSE register, 64-bit for operation
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vfrczps&nbsp;ymm0,[esi]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;load&nbsp;fractional&nbsp;parts

</pre>

<div class="p"><!----></div>
<a class="a" id="vpcmov"></a>
<tt>vpcmov</tt> copies bits from either first or second source into destination
depending on the values of corresponding bits in the fourth operand (the
selector). If the bit in selector is set, the corresponding bit from first
source is copied into the same position in destination, otherwise the bit from
second source is copied. Either second source or selector can be memory
location, 128-bit or 256-bit depending on whether SSE registers or AVX
registers are specified as the other operands.

................................................................................

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb2.5">
</a> 
<div style="text-align:center">
<table border="1" class="tabular">
<tr><td align="center">Code </td><td align="center">Mnemonic </td><td align="center">Description </td></tr><tr><td></td></tr>
<tr><td align="center">0 </td><td align="center"><tt>lt</tt> </td><td align="center">less than </td></tr>
<tr><td align="center">1 </td><td align="center"><tt>le</tt> </td><td align="center">less than or equal </td></tr>
<tr><td align="center">2 </td><td align="center"><tt>gt</tt> </td><td align="center">greater than </td></tr>
<tr><td align="center">3 </td><td align="center"><tt>ge</tt> </td><td align="center">greater than or equal </td></tr>
<tr><td align="center">4 </td><td align="center"><tt>eq</tt> </td><td align="center">equal </td></tr>
<tr><td align="center">5 </td><td align="center"><tt>neq</tt> </td><td align="center">not equal </td></tr>
................................................................................
</div>

<div style="text-align:center">Table 2.5: XOP comparisons.</div>
<a id="tab:XOP_comparisons">
</a>

<div class="p"><!----></div>
<a class="a" id="vpermil2ps"></a>
<a class="a" id="vpermil2pd"></a>
<tt>vpermil2ps</tt> and <tt>vpermil2pd</tt> set the elements in destination register to
zero or to a value selected from first or second source depending on the
corresponding bit fields from the fourth operand (the selector) and the
immediate value provided in fifth operand. Refer to the AMD manuals for the
detailed explanation of the operation performed by these instructions. Each
of the first four operands can be a register, and either second source or
selector can be memory location, 128-bit or 256-bit depending on whether SSE
registers or AVX registers are used for the other operands.
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpermil2ps&nbsp;ymm0,ymm3,ymm7,ymm2,0&nbsp;&nbsp;;&nbsp;permute&nbsp;from&nbsp;two&nbsp;sources

</pre>

<div class="p"><!----></div>

<a class="a" id="vphaddbw"></a>
<a class="a" id="vphaddubw"></a>
<a class="a" id="vphaddbd"></a>
<a class="a" id="vphaddubd"></a>
<a class="a" id="vphaddbq"></a>
<a class="a" id="vphaddubq"></a>
<a class="a" id="vphaddwd"></a>
<a class="a" id="vphadduwd"></a>
<a class="a" id="vphaddwq"></a>
<a class="a" id="vphadduwq"></a>
<a class="a" id="vphadddq"></a>
<a class="a" id="vphaddudq"></a>
<a class="a" id="vphsubbw"></a>
<a class="a" id="vphsubwd"></a>
<a class="a" id="vphsubdq"></a>
<a class="a" id=""></a>
<tt>vphaddbw</tt> adds pairs of adjacent signed bytes to form 16-bit values and
stores them at the same positions in destination. <tt>vphaddubw</tt> does the same
but treats the bytes as unsigned. <tt>vphaddbd</tt> and <tt>vphaddubd</tt> sum all bytes
(either signed or unsigned) in each four-byte block to 32-bit results,
<tt>vphaddbq</tt> and <tt>vphaddubq</tt> sum all bytes in each eight-byte block to
64-bit results, <tt>vphaddwd</tt> and <tt>vphadduwd</tt> add pairs of words to 32-bit
results, <tt>vphaddwq</tt> and <tt>vphadduwq</tt> sum all words in each four-word block to
64-bit results, <tt>vphadddq</tt> and <tt>vphaddudq</tt> add pairs of double words to 64-bit
results. <tt>vphsubbw</tt> subtracts in each two-byte block the byte at higher
position from the one at lower position, and stores the result as a signed
16-bit value at the corresponding position in destination, <tt>vphsubwd</tt>
subtracts in each two-word block the word at higher position from the one at
lower position and makes signed 32-bit results, <tt>vphsubdq</tt> subtract in each
block of two double word the one at higher position from the one at lower
position and makes signed 64-bit results. Each of these instructions takes
two operands, the destination being SSE register, and the source being SSE
register or 128-bit memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vphadduwq&nbsp;xmm0,xmm1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;sum&nbsp;quadruplets&nbsp;of&nbsp;words

</pre>

<div class="p"><!----></div>

<a class="a" id="vpmacsww"></a>
<a class="a" id="vpmacssww"></a>
<a class="a" id="vpmacsdd"></a>
<a class="a" id="vpmacssdd"></a>
<a class="a" id="vpmacswd"></a>
<a class="a" id="vpmacsswd"></a>
<a class="a" id="vpmacsdql"></a>
<a class="a" id="vpmacssdql"></a>
<a class="a" id="vpmacsdqh"></a>
<a class="a" id="vpmacssdqh"></a>
<a class="a" id="vpmadcswd"></a>
<a class="a" id="vpmadcsswd"></a>
<a class="a" id=""></a>
<a class="a" id=""></a>
<tt>vpmacsww</tt> and <tt>vpmacssww</tt> multiply the corresponding signed 16-bit values
from the first and second source and then add the products to the parallel
values from the third source, then <tt>vpmacsww</tt> takes the lowest 16 bits of the
result and <tt>vpmacssww</tt> saturates the result down to 16-bit value, and they
store the final 16-bit results in the destination. <tt>vpmacsdd</tt> and <tt>vpmacssdd</tt>
perform the analogous operation on 32-bit values. <tt>vpmacswd</tt> and <tt>vpmacsswd</tt> do
the same calculation only on the low 16-bit values from each 32-bit block and
form the 32-bit results. <tt>vpmacsdql</tt> and <tt>vpmacssdql</tt> perform such operation
................................................................................

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpmacsdd&nbsp;xmm6,xmm1,[ebx],xmm6&nbsp;&nbsp;;&nbsp;accumulate&nbsp;product

</pre>

<div class="p"><!----></div>
<a class="a" id="vpperm"></a>
<tt>vpperm</tt> selects bytes from first and second source, optionally applies a
separate transformation to each of them, and stores them in the destination.
The bit fields in fourth operand (the selector) specify for each position in
destination what byte from which source is taken and what operation is applied
to it before it is stored there. Refer to the AMD manuals for the detailed
information about these bit fields. This instruction takes four operands,
either second source or selector can be a 128-bit memory (or they can be SSE
registers both), all the other operands have to be SSE registers.

<div class="p"><!----></div>
<a class="a" id="vpshlb"></a>
<a class="a" id="vpshlw"></a>
<a class="a" id="vpshld"></a>
<a class="a" id="vpshlq"></a>
<tt>vpshlb</tt>, <tt>vpshlw</tt>, <tt>vpshld</tt> and <tt>vpshlq</tt> shift logically bytes, words, double
words or quad words respectively. The amount of bits to shift by is specified
for each element separately by the signed byte placed at the corresponding
position in the third operand. The source containing elements to shift is
provided as second operand. Either second or third operand can be 128-bit
memory (or they can be SSE registers both) and the other operands have to be
SSE registers.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vpshld&nbsp;xmm3,xmm1,[ebx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;shift&nbsp;bytes&nbsp;from&nbsp;xmm1

</pre>

<div class="p"><!----></div>

<a class="a" id="vpshab"></a>
<a class="a" id="vpshaw"></a>
<a class="a" id="vpshad"></a>
<a class="a" id="vpshaq"></a>
<a class="a" id="vprotb"></a>
<a class="a" id="vprotw"></a>
<a class="a" id="vprotd"></a>
<a class="a" id="vprotq"></a>
<tt>vpshab</tt>, <tt>vpshaw</tt>, <tt>vpshad</tt> and <tt>vpshaq</tt> arithmetically shift bytes, words,
double words or quad words. These instructions follow the same rules as the
logical shifts described above. <tt>vprotb</tt>, <tt>vprotw</tt>, <tt>vprotd</tt> and <tt>vprotq</tt>
rotate bytes, word, double words or quad words. They follow the same rules as
shifts, but additionally allow third operand to be immediate value, in which
case the same amount of rotation is specified for all the elements in source.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vprotb&nbsp;xmm0,[esi],3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;rotate&nbsp;bytes&nbsp;to&nbsp;the&nbsp;left

</pre>

<div class="p"><!----></div>
<a class="a" id="movbe"></a>
The MOVBE extension introduces just one new instruction, <tt>movbe</tt>, which
swaps bytes in value from source before storing it in destination, so can
be used to load and store big endian values. It takes two operands, either
the destination or source should be a 16-bit, 32-bit or 64-bit memory (the
last one being only allowed in long mode), and the other operand should be
a general register of the same size.

<div class="p"><!----></div>
................................................................................
The BMI extension, consisting of two subsets - BMI1 and BMI2, introduces
new instructions operating on general registers, which use the same encoding
as AVX instructions and so allow the extended syntax. All these instructions
use 32-bit operands, and in long mode they also allow the forms with 64-bit
operands.

<div class="p"><!----></div>
<a class="a" id="andn"></a>
<tt>andn</tt> calculates the bitwise AND of second source with the inverted bits
of first source and stores the result in destination. The destination and
the first source have to be general registers, the second source can be
general register or memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;andn&nbsp;edx,eax,[ebx]&nbsp;&nbsp;&nbsp;;&nbsp;bit-multiply&nbsp;inverted&nbsp;eax&nbsp;with&nbsp;memory

</pre>

<div class="p"><!----></div>
<a class="a" id="bextr"></a>
<tt>bextr</tt> extracts from the first source the sequence of bits using an index
and length specified by bit fields in the second source operand and stores
it into destination. The lowest 8 bits of second source specify the position
of bit sequence to extract and the next 8 bits of second source specify the
length of sequence. The first source can be a general register or memory,
the other two operands have to be general registers.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;bextr&nbsp;eax,[esi],ecx&nbsp;&nbsp;;&nbsp;extract&nbsp;bit&nbsp;field&nbsp;from&nbsp;memory

</pre>

<div class="p"><!----></div>
<a class="a" id="blsi"></a>
<tt>blsi</tt> extracts the lowest set bit from the source, setting all the other
bits in destination to zero. The destination must be a general register,
the source can be general register or memory.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;blsi&nbsp;rax,r11&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;isolate&nbsp;the&nbsp;lowest&nbsp;set&nbsp;bit

</pre>

<div class="p"><!----></div>
<a class="a" id="blsmsk"></a>
<a class="a" id="blsr"></a>
<tt>blsmsk</tt> sets all the bits in the destination up to the lowest set bit in
the source, including this bit. <tt>blsr</tt> copies all the bits from the source to
destination except for the lowest set bit, which is replaced by zero. These
instructions follow the same rules for operands as <tt>blsi</tt>.

<div class="p"><!----></div>
<a class="a" id="tzcnt"></a>
<a class="a" id="lzcnt"></a>
<tt>tzcnt</tt> counts the number of trailing zero bits, that is the zero bits up to
the lowest set bit of source value. This instruction is analogous to <tt>lzcnt</tt>
and follows the same rules for operands, so it also has a 16-bit version,
unlike the other BMI instructions.

<div class="p"><!----></div>
<a class="a" id="bzhi"></a>
<tt>bzhi</tt> is BMI2 instruction, which copies the bits from first source to
destination, zeroing all the bits up from the position specified by second
source. It follows the same rules for operands as <tt>bextr</tt>.

<div class="p"><!----></div>
<a class="a" id="pext"></a>
<a class="a" id="pdep"></a>
<tt>pext</tt> uses a mask in second source operand to select bits from first
operands and puts the selected bits as a continuous sequence into destination.
<tt>pdep</tt> performs the reverse operation - it takes sequence of bits from the
first source and puts them consecutively at the positions where the bits in
second source are set, setting all the other bits in destination to zero.
These BMI2 instructions follow the same rules for operands as <tt>andn</tt>.

<div class="p"><!----></div>
<a class="a" id="mulx"></a>
<tt>mulx</tt> is a BMI2 instruction which performs an unsigned multiplication of
value from EDX or RDX register (depending on the size of specified operands)
by the value from third operand, and stores the low half of result in the
second operand, and the high half of result in the first operand, and it does
it without affecting the flags. The third operand can be general register or
memory, and both the destination operands have to be general registers.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;mulx&nbsp;edx,eax,ecx&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;multiply&nbsp;edx&nbsp;by&nbsp;ecx&nbsp;into&nbsp;edx:eax

</pre>

<div class="p"><!----></div>
<a class="a" id="shlx"></a>
<a class="a" id="shrx"></a>
<a class="a" id="sarx"></a>
<tt>shlx</tt>, <tt>shrx</tt> and <tt>sarx</tt> are BMI2 instructions, which perform logical or
arithmetical shifts of value from first source by the amount specified by
second source, and store the result in destination without affecting the
flags. The have the same rules for operands as <tt>bzhi</tt> instruction.

<div class="p"><!----></div>
<a class="a" id="rorx"></a>
<tt>rorx</tt> is a BMI2 instruction which rotates right the value from source
operand by the constant amount specified in third operand and stores the
result in destination without affecting the flags. The destination operand
has to be general register, the source operand can be general register or
memory, and the third operand has to be an immediate value.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;rorx&nbsp;eax,edx,7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;rotate&nbsp;without&nbsp;affecting&nbsp;flags

</pre>

<div class="p"><!----></div>
<a class="a" id="blsic"></a>
<a class="a" id="blsfill"></a>
The TBM is an extension designed by AMD to supplement the BMI set. The
<tt>bextr</tt> instruction is extended with a new form, in which second source is
a 32-bit immediate value. <tt>blsic</tt> is a new instruction which performs the
same operation as <tt>blsi</tt>, but with the bits of result reversed. It uses the
same rules for operands as <tt>blsi</tt>. <tt>blsfill</tt> is a new instruction, which takes
the value from source, sets all the bits below the lowest set bit and store
the result in destination, it also uses the same rules for operands as <tt>blsi</tt>.

<div class="p"><!----></div>
<a class="a" id="blci"></a>
<a class="a" id="blcic"></a>
<a class="a" id="blcs"></a>
<a class="a" id="blcmsk"></a>
<a class="a" id="blcfill"></a>
<tt>blci</tt>, <tt>blcic</tt>, <tt>blcs</tt>, <tt>blcmsk</tt> and <tt>blcfill</tt> are instructions analogous
to <tt>blsi</tt>, <tt>blsic</tt>, <tt>blsr</tt>, <tt>blsmsk</tt> and <tt>blsfill</tt> respectively, but they
perform the bit-inverted versions of the same operations. They follow the
same rules for operands as the instructions they reflect.

<div class="p"><!----></div>
<a class="a" id="tzmsk"></a>
<a class="a" id="t1mskc"></a>
<tt>tzmsk</tt> finds the lowest set bit in value from source operand, sets all bits
below it to 1 and all the rest of bits to zero, then writes the result to
destination. <tt>t1mskc</tt> finds the least significant zero bit in the value from
source  operand, sets the bits below it to zero and all the other bits to 1,
and writes the result to destination. These instructions have the same rules
for operands as <tt>blsi</tt>.

<div class="p"><!----></div>
     <a id="tth_sEc2.1.24"></a><h3>
2.1.24&nbsp;&nbsp;AVX-512 instructions</h3>

<div class="p"><!----></div>
The AVX-512 introduces 512-bit vector registers, which extend the 256-bit
registers used by AVX and AVX2. It also extends the set of vector registers
from 16 to 32, with the additional registers <tt>zmm16</tt> to <tt>zmm31</tt>, their low 
256-bit portions <tt>ymm16</tt> to <tt>ymm31</tt> and their low 128-bit portions <tt>xmm16</tt>
to <tt>xmm31</tt>. These additional registers can only be accessed in the long mode.

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb2.6">
</a> 
<div style="text-align:center">
<table border="1" class="tabular">
<tr><td align="center">Size </td><td colspan="7" align="center">Registers </td><td align="center"></td></tr><tr><td></td></tr>
<tr><td align="center">128-bit </td><td align="center"><tt>xmm16</tt> </td><td align="center"><tt>xmm17</tt> </td><td align="center"><tt>xmm18</tt> </td><td align="center"><tt>xmm19</tt> </td><td align="center"><tt>xmm20</tt> </td><td align="center"><tt>xmm21</tt> </td><td align="center"><tt>xmm22</tt> </td><td align="center"><tt>xmm23</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>xmm24</tt> </td><td align="center"><tt>xmm25</tt> </td><td align="center"><tt>xmm26</tt> </td><td align="center"><tt>xmm27</tt> </td><td align="center"><tt>xmm28</tt> </td><td align="center"><tt>xmm29</tt> </td><td align="center"><tt>xmm30</tt> </td><td align="center"><tt>xmm31</tt> </td></tr>
<tr><td align="center">256-bit </td><td align="center"><tt>ymm16</tt> </td><td align="center"><tt>ymm17</tt> </td><td align="center"><tt>ymm18</tt> </td><td align="center"><tt>ymm19</tt> </td><td align="center"><tt>ymm20</tt> </td><td align="center"><tt>ymm21</tt> </td><td align="center"><tt>ymm22</tt> </td><td align="center"><tt>ymm23</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>ymm24</tt> </td><td align="center"><tt>ymm25</tt> </td><td align="center"><tt>ymm26</tt> </td><td align="center"><tt>ymm27</tt> </td><td align="center"><tt>ymm28</tt> </td><td align="center"><tt>ymm29</tt> </td><td align="center"><tt>ymm30</tt> </td><td align="center"><tt>ymm31</tt> </td></tr>
<tr><td align="center">512-bit </td><td align="center"><tt>zmm16</tt> </td><td align="center"><tt>zmm17</tt> </td><td align="center"><tt>zmm18</tt> </td><td align="center"><tt>zmm19</tt> </td><td align="center"><tt>zmm20</tt> </td><td align="center"><tt>zmm21</tt> </td><td align="center"><tt>zmm22</tt> </td><td align="center"><tt>zmm23</tt> </td></tr>
<tr><td align="center"></td><td align="center"><tt>zmm24</tt> </td><td align="center"><tt>zmm25</tt> </td><td align="center"><tt>zmm26</tt> </td><td align="center"><tt>zmm27</tt> </td><td align="center"><tt>zmm28</tt> </td><td align="center"><tt>zmm29</tt> </td><td align="center"><tt>zmm30</tt> </td><td align="center"><tt>zmm31</tt> </td></tr></table>
</div>

<div style="text-align:center">Table 2.6: New registers available in long mode with AVX-512.</div>

<div class="p"><!----></div>
In addition to new operand sizes and registers, the AVX-512 introduces
a number of supplementary settings that can be included in the operands
of AVX instructions.

<div class="p"><!----></div>
The destination operand of the most of AVX instructions can be followed
by the name of an opmask register enclosed in braces, this modifier
specifies a mask that decides which units of data in the destination
operand are going to be updated. The <tt>k0</tt> register cannot be used as a
destination mask. This setting can be further followed by <tt>{z}</tt> modifier
to choose that the data units not selected by mask should be zeroed
instead of leaving them unchanged.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vaddpd&nbsp;zmm1{k1},zmm5,zword&nbsp;[rsi]&nbsp;&nbsp;;&nbsp;update&nbsp;selected&nbsp;floats
&nbsp;&nbsp;&nbsp;&nbsp;vaddps&nbsp;ymm6{k1}{z},ymm12,ymm24&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;update&nbsp;selected,&nbsp;zero&nbsp;other&nbsp;ones

</pre>

<div class="p"><!----></div>
When an instruction that operates on packed data has a source operand
loaded from a memory, the memory location may be just a single unit of data
and the source used for the operation is created by broadcasting this
value into all the units within the required size. To specify that such
broadcasting method is used the memory operand should be followed by one
of the <tt>{1to2}</tt>, <tt>{1to4}</tt>, <tt>{1to8}</tt>, <tt>{1to16}</tt>, <tt>{1to32}</tt> and <tt>{1to64}</tt>
modifiers, selecting the appropriate multiply of a unit.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vsubps&nbsp;zmm1,zmm2,dword&nbsp;[rsi]&nbsp;{1to16}&nbsp;;&nbsp;subtract&nbsp;from&nbsp;all&nbsp;floats

</pre>

<div class="p"><!----></div>
When an instruction does not use a memory operand often an additional
operand may follow the source operands, containing the rounding mode
specifier. When an instruction has variants that operate on different
sizes of data, the rounding mode can be specified only when the
register operands are 512-bit.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vdivps&nbsp;zmm2,zmm3,zmm5,{ru-sae}&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;round&nbsp;results&nbsp;up

</pre>

<div class="p"><!----></div>

<div class="p"><!----></div>
<a id="tth_tAb2.7">
</a> 
<div style="text-align:center">
<table border="1" class="tabular">
<tr><td align="center">Operand </td><td align="center">Description </td></tr><tr><td></td></tr>
<tr><td align="center"><tt>{rn-sae}</tt> </td><td align="center">round to nearest and suppress all exceptions </td></tr>
<tr><td align="center"><tt>{rd-sae}</tt> </td><td align="center">round down and suppress all exceptions </td></tr>
<tr><td align="center"><tt>{ru-sae}</tt> </td><td align="center">round up and suppress all exceptions </td></tr>
<tr><td align="center"><tt>{rz-sae}</tt> </td><td align="center">round toward zero and suppress all exceptions </td></tr></table>
</div>

<div style="text-align:center">Table 2.7: AVX-512 rounding modes.</div>

<div class="p"><!----></div>
Some of the instructions do not use a rounding mode but still allow
to specify the exception suppression option with <tt>{sae}</tt> modifier in the
additional operand.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vmaxpd&nbsp;zmm0,zmm1,zmm2,{sae}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;suppress&nbsp;all&nbsp;exceptions

</pre>

<div class="p"><!----></div>
The family of <tt>gather</tt> instructions in their AVX-512 variants use a new
syntax with only two operands. The opmask register takes the role which
was played by the third operand in the AVX2 syntax and it is mandatory
in this case.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vgatherdps&nbsp;xmm0{k1},[eax+xmm1]&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;gather&nbsp;four&nbsp;floats
&nbsp;&nbsp;&nbsp;&nbsp;vgatherdpd&nbsp;zmm0{k3},[ymm3*8]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;gather&nbsp;eight&nbsp;doubles

</pre>

<div class="p"><!----></div>
The new family of <tt>scatter</tt> instructions perform an operation reverse to
the one of <tt>gather</tt>. They also take two operands, the destination is a 
memory with vector indexing and opmask modifier, and the source is a vector
register.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;vscatterdps&nbsp;[eax+xmm1]{k1},xmm0&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;scatter&nbsp;four&nbsp;floats
&nbsp;&nbsp;&nbsp;&nbsp;vscatterdpd&nbsp;[ymm3*8]{k3},zmm0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;scatter&nbsp;eight&nbsp;doubles

</pre>

<div class="p"><!----></div>
     <a id="tth_sEc2.1.25"></a><h3>
2.1.25&nbsp;&nbsp;Other extensions of instruction set</h3>

<div class="p"><!----></div>
There is a number of additional instruction set extensions recognized by flat
assembler, and the general syntax of the instructions introduced by those
extensions is provided here. For a detailed information on the operations
performed by them, check out the manuals from Intel (for the VMX, SMX, XSAVE,
RDRAND, FSGSBASE, INVPCID, HLE, RTM, and MPX extensions) or AMD (for the SVM extension).

<div class="p"><!----></div>


<a class="a" id="vmxon"></a>
<a class="a" id="vmxoff"></a>
<a class="a" id="vmlaunch"></a>
<a class="a" id="vmresume"></a>
<a class="a" id="vmcall"></a>
The Virtual-Machine Extensions (VMX) provide a set of instructions for the
management of virtual machines. The <tt>vmxon</tt> instruction, which enters the VMX
operation, requires a single 64-bit memory operand, which should be a physical
address of memory region, which the logical processor may use to support VMX
operation. The <tt>vmxoff</tt> instruction, which leaves the VMX operation, has no
operands. The <tt>vmlaunch</tt> and <tt>vmresume</tt>, which launch or resume the virtual
machines, and <tt>vmcall</tt>, which allows guest software to call the VM monitor,
use no operands either.

<div class="p"><!----></div>

<a class="a" id="vmptrld"></a>
<a class="a" id="vmptrst"></a>
<a class="a" id="vmclear"></a>
The <tt>vmptrld</tt> loads the physical address of current Virtual Machine Control
Structure (VMCS) from its memory operand, <tt>vmptrst</tt> stores the pointer to
current VMCS into address specified by its memory operand, and <tt>vmclear</tt> sets
the launch state of the VMCS referenced by its memory operand to clear. These
three instruction all require single 64-bit memory operand.

<div class="p"><!----></div>
<a class="a" id="vmread"></a>
<a class="a" id="vmwrite"></a>
The <tt>vmread</tt> reads from VCMS a field specified by the source operand and
stores it into the destination operand. The source operand should be a
general purpose register, and the destination operand can be a register of
memory. The <tt>vmwrite</tt> writes into a VMCS field specified by the destination
operand the value provided by source operand. The source operand can be a
general purpose register or memory, and the destination operand must be a
register. The size of operands for those instructions should be 64-bit when
in long mode, and 32-bit otherwise.

<div class="p"><!----></div>
<a class="a" id="invept"></a>
<a class="a" id="invvpid"></a>
The <tt>invept</tt> and <tt>invvpid</tt> invalidate the translation lookaside buffers
(TLBs) and paging-structure caches, either derived from extended page tables
(EPT), or based on the virtual processor identifier (VPID). These instructions
require two operands, the first one being the general purpose register
specifying the type of invalidation, and the second one being a 128-bit
memory operand providing the invalidation descriptor. The first operand
should be a 64-bit register when in long mode, and 32-bit register otherwise.

<div class="p"><!----></div>
<a class="a" id="getsec"></a>
The Safer Mode Extensions (SMX) provide the functionalities available
throught the <tt>getsec</tt> instruction. This instruction takes no operands, and
the function that is executed is determined by the contents of EAX register
upon executing this instruction.

<div class="p"><!----></div>
<a class="a" id="skinit"></a>
The Secure Virtual Machine (SVM) is a variant of virtual machine extension
used by AMD. The <tt>skinit</tt> instruction securely reinitializes the processor
allowing the startup of trusted software, such as the virtual machine monitor
(VMM). This instruction takes a single operand, which must be EAX, and
provides a physical address of the secure loader block (SLB).

<div class="p"><!----></div>
<a class="a" id="vmrun"></a>
<a class="a" id="vmsave"></a>
<a class="a" id="vmload"></a>
The <tt>vmrun</tt> instruction is used to start a guest virtual machine,
its only operand should be an accumulator register (AX, EAX or RAX, the
last one available only in long mode) providing the physical address of the
virtual machine control block (VMCB). The <tt>vmsave</tt> stores a subset of
processor state into VMCB specified by its operand, and <tt>vmload</tt> loads the
same subset of processor state from a specified VMCB. The same operand rules
as for the <tt>vmrun</tt> apply to those two instructions.

<div class="p"><!----></div>
<a class="a" id="vmmcall"></a>
<tt>vmmcall</tt> allows the guest software to call the VMM. This instruction takes
no operands.

<div class="p"><!----></div>
<a class="a" id="stgi"></a>
<a class="a" id="clgi"></a>
<tt>stgi</tt> set the global interrupt flag to 1, and <tt>clgi</tt> zeroes it. These
instructions take no operands.

<div class="p"><!----></div>
<a class="a" id="invlpga"></a>
<tt>invlpga</tt> invalidates the TLB mapping for a virtual page specified by the
first operand (which has to be accumulator register) and address space
identifier specified by the second operand (which must be ECX register).

<div class="p"><!----></div>
<a class="a" id="xsave"></a>

<a class="a" id="xsaveopt"></a>
<a class="a" id="xrstor"></a>
<a class="a" id="xsave64"></a>
<a class="a" id="xsaveopt64"></a>
<a class="a" id="xrstor64"></a>
The XSAVE set of instructions allows to save and restore processor state
components. <tt>xsave</tt> and <tt>xsaveopt</tt> store the components of processor state
defined by bit mask in EDX and EAX registers into area defined by memory
operand. <tt>xrstor</tt> restores from the area specified by memory operand the
components of processor state defined by mask in EDX and EAX. The <tt>xsave64</tt>,
<tt>xsaveopt64</tt> and <tt>xrstor64</tt> are 64-bit versions of these instructions, allowed
only in long mode.

<div class="p"><!----></div>
<a class="a" id="xgetbv"></a>
<a class="a" id="xsetbv"></a>
<tt>xgetbv</tt> read the contents of 64-bit XCR (extended control register)
specified in ECX register into EDX and EAX registers. <tt>xsetbv</tt> writes the
contents of EDX and EAX into the 64-bit XCR specified by ECX register. These
instructions have no operands.

<div class="p"><!----></div>
<a class="a" id="rdrand"></a>
The RDRAND extension introduces one new instruction, <tt>rdrand</tt>, which loads
the hardware-generated random value into general register. It takes one
operand, which can be 16-bit, 32-bit or 64-bit register (with the last one
being allowed only in long mode).

<div class="p"><!----></div>

<a class="a" id="rdfsbase"></a>
<a class="a" id="rdgsbase"></a>
<a class="a" id="wrfsbase"></a>
<a class="a" id="wrgsbase"></a>
The FSGSBASE extension adds long mode instructions that allow to read and
write the segment base registers for FS and GS segments. <tt>rdfsbase</tt> and
<tt>rdgsbase</tt> read the corresponding segment base registers into operand, while
<tt>wrfsbase</tt> and <tt>wrgsbase</tt> write the value of operand into those register.
All these instructions take one operand, which can be 32-bit or 64-bit general
register.

<div class="p"><!----></div>
<a class="a" id="invpcid"></a>
The INVPCID extension adds <tt>invpcid</tt> instruction, which invalidates mapping
in the TLBs and paging caches based on the invalidation type specified in
first operand and PCID invalidate descriptor specified in second operand.
The first operands should be 32-bit general register when not in long mode,
or 64-bit general register when in long mode. The second operand should be
128-bit memory location.

<div class="p"><!----></div>

<a class="a" id="xacquire"></a>
<a class="a" id="xrelease"></a>
<a class="a" id="xbegin"></a>
<a class="a" id="xend"></a>
<a class="a" id="xabort"></a>
<a class="a" id="xtest"></a>
The HLE and RTM extensions provide set of instructions for the transactional
management. The <tt>xacquire</tt> and <tt>xrelease</tt> are new prefixes that can be used
with some of the instructions to start or end lock elision on the memory
address specified by prefixed instruction. The <tt>xbegin</tt> instruction starts
the transactional execution, its operand is the address a fallback routine
that gets executes in case of transaction abort, specified like the operand
for near jump instruction. <tt>xend</tt> marks the end of transcational execution
region, it takes no operands. <tt>xabort</tt> forces the transaction abort, it takes
an 8-bit immediate value as its only operand, this value is passed in the
highest bits of EAX to the fallback routine. <tt>xtest</tt> checks whether there is
transactional execution in progress, this instruction takes no operands.

<div class="p"><!----></div>
The MPX extension adds instructions that operate on new bounds registers
and aid in checking the memory references. For some of these instructions
flat assemblers allows a special syntax that allows a fine control over their
operation, where an address of a memory operand is separated into two parts
with a comma. With <tt>bndmk</tt> instruction the first part of such address specifies
the lower bound and the second one the upper bound. The lower bound can be
either zero or a register, the upper bound can be any address that uses no more
than one register (multiplied by 1, 2, 4, or 8). The addressing registers need to
be 64-bit when in long mode, and 32-bit otherwise.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;bndmk&nbsp;bnd0,[rbx,100000h]&nbsp;;&nbsp;lower&nbsp;bound&nbsp;in&nbsp;register,&nbsp;upper&nbsp;directly
&nbsp;&nbsp;&nbsp;&nbsp;bndmk&nbsp;bnd1,[0,rbx]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;;&nbsp;lower&nbsp;bound&nbsp;zero,&nbsp;upper&nbsp;in&nbsp;register

</pre>
In case of <tt>bndldx</tt> and <tt>bndstx</tt>, the first part of memory operand specifies an
address used to access a bound table entry, while the second part is either zero
or a register that plays a role of an additional operand for such instruction.
The address in the first part may use no more than one register and the register
cannot be multiplied by a number other than 1.

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;bndstx&nbsp;[rcx,rsi],bnd3&nbsp;&nbsp;;&nbsp;store&nbsp;bnd3&nbsp;and&nbsp;rsi&nbsp;at&nbsp;rcx&nbsp;in&nbsp;the&nbsp;bound&nbsp;table
&nbsp;&nbsp;&nbsp;&nbsp;bndldx&nbsp;bnd2,[rcx,rsi]&nbsp;&nbsp;;&nbsp;load&nbsp;from&nbsp;bound&nbsp;table&nbsp;if&nbsp;entry&nbsp;matches&nbsp;rsi

</pre>

<div class="p"><!----></div>
 <a id="tth_sEc2.2"></a><h2>
2.2&nbsp;&nbsp;Control directives</h2>
<a id="sec:control">
</a>
This section describes the directives that control the assembly process, they
are processed during the assembly and may cause some blocks of instructions
to be assembled differently or not assembled at all.

<div class="p"><!----></div>
     <a id="tth_sEc2.2.1"></a><h3>
2.2.1&nbsp;&nbsp;Numerical constants</h3>
<a class="a" id="_"></a>

The <tt>=</tt> directive allows to define the numerical constant. It should be preceded by
the name for the constant and followed by the numerical expression providing the value.
The value of such constants can be a number or an address, but - unlike labels - the
numerical constants are not allowed to hold the register-based addresses.
Besides this difference, in their basic variant numerical constants behave
very much like labels and you can even forward-reference them (access their
values before they actually get defined).
................................................................................
</pre>
which declares label placed at <tt>ebp+4</tt> address. However remember that labels,
unlike numerical constants, cannot become assembly-time variables.

<div class="p"><!----></div>
     <a id="tth_sEc2.2.2"></a><h3>
2.2.2&nbsp;&nbsp;Conditional assembly</h3>
<a class="a" id="IF"></a>

<tt>if</tt> directive causes some block of instructions to be assembled only under
certain condition. It should be followed by logical expression specifying the
condition, instructions in next lines will be assembled only when this
condition is met, otherwise they will be skipped. The optional <tt>else&nbsp;if</tt>
directive followed with logical expression specifying additional condition
begins the next block of instructions that will be assembled if previous
conditions were not met, and the additional condition is met. The optional
................................................................................
<tt>eax,16&nbsp;eqtype&nbsp;fs,3+7</tt> condition is true, but <tt>eax,16&nbsp;eqtype&nbsp;eax,1.6</tt> is false.

<div class="p"><!----></div>
     <a id="tth_sEc2.2.3"></a><h3>
2.2.3&nbsp;&nbsp;Repeating blocks of instructions</h3>
<a id="sec:repeating">
</a>
<a class="a" id="TIMES"></a>

<tt>times</tt> directive repeats one instruction specified number of times. It
should be followed by numerical expression specifying number of repeats and
the instruction to repeat (optionally colon can be used to separate number and
instruction). When special symbol <tt>%</tt> is used inside the instruction, it is
equal to the number of current repeat. For example <tt>times&nbsp;5&nbsp;db&nbsp;%</tt> will define
five bytes with values 1, 2, 3, 4, 5. Recursive use of <tt>times</tt> directive is
also allowed, so <tt>times&nbsp;3&nbsp;times&nbsp;%&nbsp;db&nbsp;%</tt> will define six bytes with values
1, 1, 2, 1, 2, 3.

<div class="p"><!----></div>
<a class="a" id="REPEAT"></a>
<tt>repeat</tt> directive repeats the whole block of instructions. It should be
followed by numerical expression specifying number of repeats. Instructions
to repeat are expected in next lines, ended with the <tt>end&nbsp;repeat</tt> directive,
for example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;repeat&nbsp;8
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mov&nbsp;byte&nbsp;[bx],%
................................................................................
addressed by BX register.

<div class="p"><!----></div>
Number of repeats can be zero, in that case the instructions are not
assembled at all.

<div class="p"><!----></div>
<a class="a" id="BREAK"></a>
The <tt>break</tt> directive allows to stop repeating earlier and continue assembly
from the first line after the <tt>end&nbsp;repeat</tt>. Combined with the <tt>if</tt> directive it
allows to stop repeating under some special condition, like:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;s&nbsp;=&nbsp;x/2
&nbsp;&nbsp;&nbsp;&nbsp;repeat&nbsp;100
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if&nbsp;x/s&nbsp;=&nbsp;s
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end&nbsp;if
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s&nbsp;=&nbsp;(s+x/s)/2
&nbsp;&nbsp;&nbsp;&nbsp;end&nbsp;repeat

</pre>

<div class="p"><!----></div>
<a class="a" id="WHILE"></a>
The <tt>while</tt> directive repeats the block of instructions as long as the
condition specified by the logical expression following it is true. The block
of instructions to be repeated should end with the <tt>end&nbsp;while</tt> directive.
Before each repetition the logical expression is evaluated and when its value
is false, the assembly is continued starting from the first line after the
<tt>end&nbsp;while</tt>. Also in this case the <tt>%</tt> symbol holds the number of current
repeat. The <tt>break</tt> directive can be used to stop this kind of loop in the same
way as with <tt>repeat</tt> directive. The previous sample can be rewritten to use the
................................................................................
however they should be closed in the same order in which they were started. The
<tt>break</tt> directive always stops processing the block that was started last with
either the <tt>repeat</tt> or <tt>while</tt> directive.

<div class="p"><!----></div>
     <a id="tth_sEc2.2.4"></a><h3>
2.2.4&nbsp;&nbsp;Addressing spaces</h3>
<a class="a" id="ORG"></a>

<tt>org</tt> directive sets address at which the following code is expected to
appear in memory. It should be followed by numerical expression specifying
the address. This directive begins the new addressing space, the following
code itself is not moved in any way, but all the labels defined within it
and the value of <tt>$</tt> symbol are affected as if it was put at the given
address. However it's the responsibility of programmer to put the code at
correct address at run-time.

<div class="p"><!----></div>
<a class="a" id="LOAD"></a>
The <tt>load</tt> directive allows to define constant with a binary value loaded
from the already assembled code. This directive should be followed by the name
of the constant, then optionally size operator, then <tt>from</tt> operator and a
numerical expression specifying a valid address in current addressing space.
The size operator has unusual meaning in this case - it states how many bytes
(up to 8) have to be loaded to form the binary value of constant. If no size
operator is specified, one byte is loaded (thus value is in range from 0 to
255). The loaded data cannot exceed current offset.

<div class="p"><!----></div>
<a class="a" id="STORE"></a>
The <tt>store</tt> directive can modify the already generated code by replacing
some of the previously generated data with the value defined by given
numerical expression, which follows. The expression can be preceded by the
optional size operator to specify how large value the expression defines, and
therefore how much bytes will be stored, if there is no size operator, the
size of one byte is assumed. Then the <tt>at</tt> operator and the numerical
expression defining the valid address in current addressing code space, at
which the given value have to be stored should follow. This is a directive for
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;store&nbsp;byte&nbsp;a&nbsp;xor&nbsp;c&nbsp;at&nbsp;$$+%-1
&nbsp;&nbsp;&nbsp;&nbsp;end&nbsp;repeat

</pre>
and each byte of code will be xored with the value defined by <tt>c</tt> constant.

<div class="p"><!----></div>
<a class="a" id="VIRTUAL"></a>
<tt>virtual</tt> defines virtual data at specified address. This data will not be
included in the output file, but labels defined there can be used in other
parts of source. This directive can be followed by <tt>at</tt> operator and the
numerical expression specifying the address for virtual data, otherwise is
uses current address, the same as <tt>virtual&nbsp;at&nbsp;$</tt>. Instructions defining data
are expected in next lines, ended with <tt>end&nbsp;virtual</tt> directive. The block of
virtual instructions itself is an independent addressing space, after it's
ended, the context of previous addressing space is restored.
................................................................................
limited by the boundaries of the block.                

<div class="p"><!----></div>
     <a id="tth_sEc2.2.5"></a><h3>
2.2.5&nbsp;&nbsp;Other directives</h3>
<a id="sec:other">
</a>
<a class="a" id="ALIGN"></a>

<tt>align</tt> directive aligns code or data to the specified boundary. It should
be followed by a numerical expression specifying the number of bytes, to the
multiply of which the current address has to be aligned. The boundary value
has to be the power of two.

<div class="p"><!----></div>
The <tt>align</tt> directive fills the bytes that had to be skipped to perform the
................................................................................

</pre>
The <tt>a</tt> constant is defined to be the difference between address after alignment
and address of the <tt>virtual</tt> block (see previous section), so it is equal to
the size of needed alignment space.

<div class="p"><!----></div>
<a class="a" id="DISPLAY"></a>
<tt>display</tt> directive displays the message at the assembly time. It should
be followed by the quoted strings or byte values, separated with commas. It
can be used to display values of some constants, for example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;bits&nbsp;=&nbsp;16
&nbsp;&nbsp;&nbsp;&nbsp;display&nbsp;'Current&nbsp;offset&nbsp;is&nbsp;0x'
&nbsp;&nbsp;&nbsp;&nbsp;repeat&nbsp;bits/4
................................................................................
All preprocessor directives are processed before the main assembly process,
and therefore are not affected by the control directives. At this time also
all comments are stripped out.

<div class="p"><!----></div>
     <a id="tth_sEc2.3.1"></a><h3>
2.3.1&nbsp;&nbsp;Including source files</h3>
<a class="a" id="INCLUDE"></a>

<tt>include</tt> directive includes the specified source file at the position
where it is used. It should be followed by the quoted name of file that
should be included, for example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;include&nbsp;'macros.inc'

................................................................................
<a id="sec:symbolic_constants">
</a>
The symbolic constants are different from the numerical constants, before the
assembly process they are replaced with their values everywhere in source
lines after their definitions, and anything can become their values.

<div class="p"><!----></div>
<a class="a" id="EQU"></a>
The definition of symbolic constant consists of name of the constant followed
by the <tt>equ</tt> directive. Everything that follows this directive will
become the value of constant. If the value of symbolic constant contains
other symbolic constants, they are replaced with their values before
assigning this value to the new constant. For example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;d&nbsp;equ&nbsp;dword
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;d&nbsp;equ&nbsp;d,eax

</pre>
the <tt>d</tt> constant would get the new value of <tt>edx,eax</tt>. This way the growing
lists of symbols can be defined.

<div class="p"><!----></div>
<a class="a" id="RESTORE"></a>
<tt>restore</tt> directive allows to get back previous value of redefined symbolic
constant. It should be followed by one more names of symbolic constants,
separated with commas. So <tt>restore&nbsp;d</tt> after the above definitions will give
<tt>d</tt> constant back the value <tt>edx</tt>, the second one will restore it to value
<tt>dword</tt>, and one more will revert <tt>d</tt> to original meaning as if no such
constant was defined. If there was no constant defined of given name,
<tt>restore</tt> will not cause an error, it will be just ignored.

................................................................................

</pre>
After this definition <tt>mov&nbsp;ax,offset&nbsp;char</tt> will be valid construction
for copying the offset of <tt>char</tt> variable into <tt>ax</tt> register,
because <tt>offset</tt> is replaced with an empty value, and therefore ignored.

<div class="p"><!----></div>
<a class="a" id="DEFINE"></a>
The <tt>define</tt> directive followed by the name of constant and then the value,
is the alternative way of defining symbolic constant. The only difference
between <tt>define</tt> and <tt>equ</tt> is that <tt>define</tt> assigns the value as it is, it does
not replace the symbolic constants with their values inside it.

<div class="p"><!----></div>
<a class="a" id="FIX"></a>
Symbolic constants can also be defined with the <tt>fix</tt> directive, which has
the same syntax as <tt>equ</tt>, but defines constants of high priority - they are
replaced with their symbolic values even before processing the preprocessor
directives and macroinstructions, the only exception is <tt>fix</tt> directive
itself, which has the highest possible priority, so it allows redefinition of
constants defined this way.

<div class="p"><!----></div>
................................................................................
with <tt>equ</tt> directive wouldn't give such result, as standard symbolic constants
are replaced with their values after searching the line for preprocessor
directives.

<div class="p"><!----></div>
     <a id="tth_sEc2.3.3"></a><h3>
2.3.3&nbsp;&nbsp;Macroinstructions</h3>
<a class="a" id="MACRO"></a>

<tt>macro</tt> directive allows you to define your own complex instructions,
called macroinstructions, using which can greatly simplify the process of
programming. In its simplest form it's similar to symbolic constant
definition. For example the following definition defines a shortcut for the
<tt>test&nbsp;al,0xFF</tt> instruction:

<pre>
................................................................................
<div class="p"><!----></div>
When it's needed to provide macroinstruction with argument that contains
some commas, such argument should be enclosed between <tt>&lt;</tt> and <tt>&gt;</tt>
characters. If it contains more than one <tt>&lt;</tt> character, the same number
of <tt>&gt;</tt> should be used to tell that the value of argument ends.

<div class="p"><!----></div>
When the name of the last argument of macroinstruction is followed by <tt>&amp;</tt>
character, this argument consumes everything up to the end of line, including
commas.

<div class="p"><!----></div>
<a class="a" id="PURGE"></a>
<tt>purge</tt> directive allows removing the last definition of specified
macroinstruction. It should be followed by one or more names of
macroinstructions, separated with commas. If such macroinstruction has not
been defined, you will not get any error. For example after having the syntax of
<tt>mov</tt> extended with the macroinstructions defined above, you can disable
syntax with three operands back by using <tt>purge&nbsp;mov</tt> directive. Next
<tt>purge&nbsp;mov</tt> will disable also syntax for two operands being segment
registers, and all the next such directives will do nothing.

<div class="p"><!----></div>
If after the <tt>macro</tt> directive you enclose a group of argument declarations
in square brackets, it will allow giving more values for this group of
arguments when using that macroinstruction. Any additional argument following the
last argument of such group will start the new group and will become the
first argument of it. For this reason after the closing square bracket no more
argument names can follow. The contents of macroinstruction will be processed
for each such group of arguments separately. The simplest example is to
enclose one argument name in square brackets:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;macro&nbsp;stoschar&nbsp;[char]
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{
................................................................................
&nbsp;&nbsp;&nbsp;&nbsp;stosb
&nbsp;&nbsp;&nbsp;&nbsp;mov&nbsp;al,3
&nbsp;&nbsp;&nbsp;&nbsp;stosb

</pre>

<div class="p"><!----></div>
<a class="a" id="LOCAL"></a>
There are some special directives available only inside the definitions of
macroinstructions. <tt>local</tt> directive defines local names, which will be
replaced with unique values each time the macroinstruction is used. It should
be followed by names separated with commas. If the name given as parameter to <tt>local</tt> directive begins with a dot or two
dots, the unique labels generated by each evaluation of macroinstruction will
have the same properties. This directive is usually needed
for the constants or labels that macroinstruction defines and uses
internally.
................................................................................

</pre>
Each time this macroinstruction is used, <tt>move</tt> will become other
unique name in its instructions, so you will not get an error you normally get
when some label is defined more than once.

<div class="p"><!----></div>

<a class="a" id="FORWARD"></a>
<a class="a" id="REVERSE"></a>
<a class="a" id="COMMON"></a>
<tt>forward</tt>, <tt>reverse</tt> and <tt>common</tt> directives divide
macroinstruction into blocks, each one processed after the processing of
previous is finished. They differ in behavior only if macroinstruction allows
multiple groups of arguments. Block of instructions that follows
<tt>forward</tt> directive is processed for each group of arguments, from
first to last - exactly like the default block (not preceded by any of these
directives). Block that follows <tt>reverse</tt> directive is processed
for each group of argument in reverse order - from last to first. Block that
................................................................................
</pre>
It is a very simplified kind of macroinstruction and it simply delegates a
block of instructions to be put at the end. 

<div class="p"><!----></div>
     <a id="tth_sEc2.3.4"></a><h3>
2.3.4&nbsp;&nbsp;Structures</h3>
<a class="a" id="STRUC"></a>

<tt>struc</tt> directive is a special variant of <tt>macro</tt> directive that is
used to define data structures. Macroinstruction defined using the
<tt>struc</tt> directive must be preceded by a label (like the data definition
directive) when it's used. This label will be also attached at the beginning
of every name starting with dot in the contents of macroinstruction. The
macroinstruction defined using the <tt>struc</tt> directive can have the same
name as some other macroinstruction defined using the <tt>macro</tt> directive,
................................................................................

<div class="p"><!----></div>
Defining data structures addressed by registers or absolute values should be
done using the <tt>virtual</tt> directive with structure macroinstruction
(see <a href="#sec:other">2.2.5</a>).

<div class="p"><!----></div>
<a class="a" id="RESTRUC"></a>
<tt>restruc</tt> directive removes the last definition of the structure, just like
<tt>purge</tt> does with macroinstructions and <tt>restore</tt> with symbolic constants.
It also has the same syntax - should be followed by one or more names of
structure macroinstructions, separated with commas.

<div class="p"><!----></div>
     <a id="tth_sEc2.3.5"></a><h3>
2.3.5&nbsp;&nbsp;Repeating macroinstructions</h3>
<a class="a" id="REPT"></a>

The <tt>rept</tt> directive is a special kind of macroinstruction, which makes given
amount of duplicates of the block enclosed with braces. The basic syntax is
<tt>rept</tt> directive followed by number and then block of source enclosed between
the <tt>{</tt> and <tt>}</tt> characters. The simplest example:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;rept&nbsp;5&nbsp;{&nbsp;in&nbsp;al,dx&nbsp;}
................................................................................
of expression associated with symbolic constant is calculated first, and then
substituted into the outer expression in place of that constant). If you need
repetitions based on values that can only be calculated at assembly time, use
one of the code repeating directives that are processed by assembler, see
section <a href="#sec:repeating">2.2.3</a>.

<div class="p"><!----></div>
<a class="a" id="IRP"></a>
The <tt>irp</tt> directive iterates the single argument through the given list of
parameters. The syntax is <tt>irp</tt> followed by the argument name, then the comma
and then the list of parameters. The parameters are specified in the same
way like in the invocation of standard macroinstruction, so they have to be
separated with commas and each one can be enclosed with the <tt>&lt;</tt> and <tt>&gt;</tt>
characters. Also the name of argument may be followed by <tt>*</tt> to mark that it
cannot get an empty value. Such block:

................................................................................

<pre>
&nbsp;&nbsp;&nbsp;db&nbsp;2
&nbsp;&nbsp;&nbsp;db&nbsp;3
&nbsp;&nbsp;&nbsp;db&nbsp;5

</pre>
<a class="a" id="IRPS"></a>

The <tt>irps</tt> directive iterates through the given list of symbols, it should
be followed by the argument name, then the comma and then the sequence of any
symbols. Each symbol in this sequence, no matter whether it is the name
symbol, symbol character or quoted string, becomes an argument value for one
iteration. If there are no symbols following the comma, no iteration is done
at all. This example:

................................................................................

<pre>
&nbsp;&nbsp;&nbsp;xor&nbsp;al,al
&nbsp;&nbsp;&nbsp;xor&nbsp;bx,bx
&nbsp;&nbsp;&nbsp;xor&nbsp;ecx,ecx

</pre>
<a class="a" id="IRPV"></a>

The <tt>irpv</tt> directive iterates through all of the values that were assigned to
the given symbolic variable. It should be followed by the argument name and
the name of symbolic variable, separated with comma. When the symbolic
variable is treated with <tt>restore</tt> directive to remove its latest value, that
value is removed from the list of values accessed by <tt>irpv</tt>. But any
modifications made to that list during the iterations performed by <tt>irpv</tt> (by
either defining a new value for symbolic variable, or destroying the value
with <tt>restore</tt> directive) do not affect the operation performed by this
directive - the list that gets iterated reflects the state of symbolic
variable at the time when <tt>irpv</tt> directive was encountered. For example this
snippet restores a symbolic variable called <tt>d</tt> to its initial state, before
any values were assigned to it:

<pre>
&nbsp;&nbsp;&nbsp;irpv&nbsp;value,&nbsp;d
&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;restore&nbsp;d&nbsp;}

</pre>
It simply generates as many copies of <tt>restore</tt> directive, as many values
there are to remove.

<div class="p"><!----></div>
The blocks defined by the <tt>irp</tt>, <tt>irps</tt> and <tt>irpv</tt> directives are also processed in
the same way as any macroinstructions, so operators and directives specific
to macroinstructions may be freely used also in this case.

<div class="p"><!----></div>
     <a id="tth_sEc2.3.6"></a><h3>
2.3.6&nbsp;&nbsp;Conditional preprocessing</h3>
<a id="sec:conditional_preprocessing">
</a>
<a class="a" id="MATCH"></a>

<tt>match</tt> directive causes some block of source to be preprocessed and passed
to assembler only when the given sequence of symbols matches the specified
pattern. The pattern comes first, ended with comma, then the symbols
that have to be matched with the pattern, and finally the block of
source, enclosed within braces as macroinstruction.

<div class="p"><!----></div>
................................................................................
on.

<div class="p"><!----></div>
 <a id="tth_sEc2.4"></a><h2>
2.4&nbsp;&nbsp;Formatter directives</h2>
<a id="sec:formatter">
</a>
<a class="a" id="FORMAT"></a>

These directives are actually also a kind of control directives, with the
purpose of controlling the format of generated code.

<div class="p"><!----></div>
<tt>format</tt> directive followed by the format identifier allows to select
the output format. This directive should be put at the beginning of the
source. Default output format is a flat binary file, it can also be selected
................................................................................
by using <tt>format&nbsp;binary</tt> directive.
This directive can be followed by the <tt>as</tt> keyword
and the quoted string specifying the default file extension for the output
file. Unless the output file name was specified from the command line,
assembler will use this extension when generating the output file.

<div class="p"><!----></div>
<a class="a" id="USE16__USE32__USE64"></a>
<tt>use16</tt> and <tt>use32</tt> directives force the assembler to generate 16-bit or
32-bit code, omitting the default setting for selected output format. <tt>use64</tt>
enables generating the code for the long mode of x86-64 processors.

<div class="p"><!----></div>
Below are described different output formats with the directives
specific to these formats.

................................................................................
<div class="p"><!----></div>
     <a id="tth_sEc2.4.1"></a><h3>
2.4.1&nbsp;&nbsp;MZ executable</h3>
To select the MZ output format, use <tt>format&nbsp;MZ</tt> directive. The default
code setting for this format is 16-bit.

<div class="p"><!----></div>
<a class="a" id="SEGMENT"></a>
<tt>segment</tt> directive defines a new segment, it should be followed by
label, which value will be the number of defined segment, optionally
<tt>use16</tt> or <tt>use32</tt> word can follow to specify whether code in this
segment should be 16-bit or 32-bit. The origin of segment is aligned to
paragraph (16 bytes). All the labels defined then will have values relative
to the beginning of this segment.

<div class="p"><!----></div>
<a class="a" id="ENTRY"></a>
<tt>entry</tt> directive sets the entry point for MZ executable, it should be
followed by the far address (name of segment, colon and the offset inside
segment) of desired entry point.

<div class="p"><!----></div>
<a class="a" id="STACK"></a>
<tt>stack</tt> directive sets up the stack for MZ executable. It can be
followed by numerical expression specifying the size of stack to be created
automatically or by the far address of initial stack frame when you want to
set up the stack manually. When no stack is defined, the stack of default
size 4096 bytes will be created.

<div class="p"><!----></div>
<a class="a" id="HEAP"></a>
<tt>heap</tt> directive should be followed by a 16-bit value defining maximum
size of additional heap in paragraphs (this is heap in addition to stack and
undefined data). Use <tt>heap&nbsp;0</tt> to always allocate only memory program
really needs. Default size of heap is 65535.

<div class="p"><!----></div>
     <a id="tth_sEc2.4.2"></a><h3>
2.4.2&nbsp;&nbsp;Portable Executable</h3>
................................................................................
To select the Portable Executable output format, use <tt>format&nbsp;PE</tt> directive,
it can be followed by additional format settings: first the target subsystem
setting, which can be <tt>console</tt> or <tt>GUI</tt> for Windows applications, <tt>native</tt>
for Windows drivers, <tt>EFI</tt>, <tt>EFIboot</tt> or <tt>EFIruntime</tt> for the UEFI, it may be
followed by the minimum version of system that the executable is targeted to
(specified in form of floating-point value). Optional <tt>DLL</tt> and <tt>WDM</tt> keywords
mark the output file as a dynamic link library and WDM driver respectively,
the <tt>large</tt> keyword marks the executable as able to handle addresses
larger than 2 GB and the <tt>NX</tt> keyword signalizes that the executable conforms to the
restriction of not executing code residing in non-executable sections.

<div class="p"><!----></div>
After those settings can follow the <tt>at</tt> operator and the numerical expression
specifying the base of PE image and then optionally <tt>on</tt> operator followed by
the quoted string containing file name selects custom MZ stub for PE program
(when specified file is not a MZ executable, it is treated as a flat binary
executable file and converted into MZ format). The default code setting for
................................................................................

<div class="p"><!----></div>
To create PE file for the x86-64 architecture, use <tt>PE64</tt> keyword instead of
<tt>PE</tt> in the format declaration, in such case the long mode code is generated
by default.

<div class="p"><!----></div>
<a class="a" id="SECTION"></a>
<tt>section</tt> directive defines a new section, it should be
followed by quoted string defining the name of section, then one
or more section flags can follow. Available flags are:
<tt>code</tt>, <tt>data</tt>, <tt>readable</tt>, <tt>writeable</tt>,
<tt>executable</tt>, <tt>shareable</tt>, <tt>discardable</tt>,
<tt>notpageable</tt>. The origin of section is aligned to page (4096
bytes). Example declaration of PE section:

................................................................................
<pre>
&nbsp;&nbsp;&nbsp;&nbsp;section&nbsp;'.reloc'&nbsp;data&nbsp;readable&nbsp;discardable&nbsp;fixups
&nbsp;&nbsp;&nbsp;&nbsp;section&nbsp;'.rsrc'&nbsp;data&nbsp;readable&nbsp;resource&nbsp;from&nbsp;'my.res'

</pre>

<div class="p"><!----></div>
<a class="a" id="ENTRY"></a>
<tt>entry</tt> directive sets the entry point for Portable Executable, the
value of entry point should follow.

<div class="p"><!----></div>
<a class="a" id="STACK"></a>
<tt>stack</tt> directive sets up the size of stack for Portable Executable,
value of stack reserve size should follow, optionally value of stack commit
separated with comma can follow. When stack is not defined, it's set by
default to size of 4096 bytes.

<div class="p"><!----></div>
<a class="a" id="HEAP"></a>
<tt>heap</tt> directive chooses the size of heap for Portable Executable, value
of heap reserve size should follow, optionally value of heap commit separated
with comma can follow. When no heap is defined, it is set by default to size
of 65536 bytes, when size of heap commit is unspecified, it is by default set
to zero.

<div class="p"><!----></div>
<a class="a" id="DATA"></a>
<a class="a" id="END"></a>
<tt>data</tt> directive begins the definition of special PE data, it should be
followed by one of the data identifiers (<tt>export</tt>, <tt>import</tt>,
<tt>resource</tt> or <tt>fixups</tt>) or by the number of data entry in PE
header. The data should be defined in next lines, ended with <tt>end&nbsp;data</tt>
directive. When fixups data definition is chosen, they are generated
automatically and no more data needs to be defined there.
The same applies to the resource data when the <tt>resource</tt>
identifier is followed by <tt>from</tt> operator and quoted file name -
................................................................................
directive, depending whether you want to create classic (DJGPP) or Microsoft's
variant of COFF file. The default code setting for this format is 32-bit. To
create the file in Microsoft's COFF format for the x86-64 architecture, use
<tt>format&nbsp;MS64&nbsp;COFF</tt> setting, in such case long mode code is generated by
default.

<div class="p"><!----></div>
<a class="a" id="SECTION"></a>
<tt>section</tt> directive defines a new section, it should be followed by
quoted string defining the name of section, then one or more section flags
can follow. Section flags available for both COFF variants are <tt>code</tt> and <tt>data</tt>,
while flags <tt>readable</tt>, <tt>writeable</tt>, <tt>executable</tt>, <tt>shareable</tt>, <tt>discardable</tt>,
<tt>notpageable</tt>, <tt>linkremove</tt> and <tt>linkinfo</tt> are available only with
Microsoft's COFF variant.

<div class="p"><!----></div>
By default section is aligned
to double word (four bytes), in case of Microsoft COFF variant other alignment
can be specified by providing the <tt>align</tt> operator followed by alignment value
(any power of two up to 8192) among the section flags.

<div class="p"><!----></div>
<a class="a" id="EXTRN"></a>
<tt>extrn</tt> directive defines the external symbol, it should be
followed by the name of symbol and optionally the size operator
specifying the size of data labeled by this symbol. The name of
symbol can be also preceded by quoted string containing name of
the external symbol and the <tt>as</tt> operator. Some example
declarations of external symbols:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;extrn&nbsp;exit
&nbsp;&nbsp;&nbsp;&nbsp;extrn&nbsp;'__imp__MessageBoxA@16'&nbsp;as&nbsp;MessageBox:dword

</pre>

<div class="p"><!----></div>
<a class="a" id="PUBLIC"></a>
<tt>public</tt> directive declares the existing symbol as public, it
should be followed by the name of symbol, optionally it can be
followed by the <tt>as</tt> operator and the quoted string
containing name under which symbol should be available as public.
Some examples of public symbols declarations:

<pre>
&nbsp;&nbsp;&nbsp;&nbsp;public&nbsp;main
................................................................................
2.4.4&nbsp;&nbsp;Executable and Linkable Format</h3>
To select ELF output format, use <tt>format&nbsp;ELF</tt> directive. The default code
setting for this format is 32-bit. To create ELF file for the x86-64
architecture, use <tt>format&nbsp;ELF64</tt> directive, in such case the long mode code is
generated by default.

<div class="p"><!----></div>
<a class="a" id="SECTION"></a>
<tt>section</tt> directive defines a new section, it should be followed by quoted
string defining the name of section, then can follow one or both of the
<tt>executable</tt> and <tt>writeable</tt> flags, optionally also <tt>align</tt> operator
followed by the number specifying the alignment of section (it has to be the power of
two), if no alignment is specified, the default value is used, which is 4 or 8,
depending on which format variant has been chosen.

<div class="p"><!----></div>
<a class="a" id="EXTRN"></a>
<a class="a" id="PUBLIC"></a>
<tt>extrn</tt> and <tt>public</tt> directives have the same meaning and syntax as
when the COFF output format is selected (described in previous section).

<div class="p"><!----></div>
The  <tt>rva</tt> operator can be used also in the case of this format (however not
when target architecture is x86-64), it converts the address into the offset
relative to the GOT table, so it may be useful to create position-independent
code. There's also a special <tt>plt</tt> operator, which allows to call the external
................................................................................
&nbsp;&nbsp;.repeat
&nbsp;&nbsp;&nbsp;&nbsp;add&nbsp;ecx,2
&nbsp;&nbsp;.until&nbsp;ecx&#62;100

</pre>

<div class="p"><!----></div>







Changes to doc/index.help.

7
8
9
10
11
12
13
14
15
16
17
18
19
20
; The format of this file is:  TITLE | FILENAME
; All files are searched from %FreshHelp% path.
; The lines starting with ";" is comment. The empty lines are ignored.
;

Help index                              |       index.md
x86 instruction set                     |       x86/x86.idx
FASM manual                             |       FASM.html
FreshLib reference                      |       FreshLibRef.md
FreshLib user guide                     |       FreshLibUserGuide.md
FreshLib object oriented programming    |       FreshLibOOP.md
Fresh IDE - advanced setup              |       advanced_setup.md
Fresh IDE - tips and tricks             |       tips.md
Linux system calls reference            |       lscr/index.idx







|






7
8
9
10
11
12
13
14
15
16
17
18
19
20
; The format of this file is:  TITLE | FILENAME
; All files are searched from %FreshHelp% path.
; The lines starting with ";" is comment. The empty lines are ignored.
;

Help index                              |       index.md
x86 instruction set                     |       x86/x86.idx
FASM manual                             |       FASM.rhtm
FreshLib reference                      |       FreshLibRef.md
FreshLib user guide                     |       FreshLibUserGuide.md
FreshLib object oriented programming    |       FreshLibOOP.md
Fresh IDE - advanced setup              |       advanced_setup.md
Fresh IDE - tips and tricks             |       tips.md
Linux system calls reference            |       lscr/index.idx

Changes to doc/index.md.

12
13
14
15
16
17
18
19
20
21
  [FreshLibUserGuide.md][FreshLib user guide] - User guide explaining how to use FreshLib for
portable programming in assembly language.

  [FreshLibOOP.md][FreshLib OOP manual] - Explains the FreshLib object oriented programming model.

  [tips.md][Fresh IDE tips and tricks] - How to use Fresh IDE for the best programming results.

  [FASM.html][Flat assembler programmers manual] - The official FlatAssembler reference manual.

  [lscr/main.html][Linux 32bit system calls reference] - An assembly language reference to the Linux system calls.







|


12
13
14
15
16
17
18
19
20
21
  [FreshLibUserGuide.md][FreshLib user guide] - User guide explaining how to use FreshLib for
portable programming in assembly language.

  [FreshLibOOP.md][FreshLib OOP manual] - Explains the FreshLib object oriented programming model.

  [tips.md][Fresh IDE tips and tricks] - How to use Fresh IDE for the best programming results.

  [FASM.rhtm][Flat assembler programmers manual] - The official FlatAssembler reference manual.

  [lscr/main.html][Linux 32bit system calls reference] - An assembly language reference to the Linux system calls.

Deleted doc/source/0_advanced_setup.txt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
Setup manual
# Chapter 4 Fresh IDE setup

    The upcoming version 3.0 of Fresh IDE will be fully portable, but until then,
we have to use the Windows version of Fresh.

    Nevertheless, development of Windows and Linux applications in the same time is possible 
    and easy with the current version of Fresh.

    The good news are that Fresh IDE runs like a charm in [Wine].

    So, now (since v2.0.5) we have a choice - whatever OS we choose we can develop applications
for Linux and Windows, including editing, compiling, running and debugging. Of course we are
talking about GUI applications.

    The installation of Fresh is easy, but if we want to use the full potential of the IDE
some setup and adjustments are necessary. Let see how to setup Fresh IDE for work in Windows 
and Linux:

[wine] http://www.winehq.org/


## Windows setup

    In order to run Linux native applications inside Windows, Fresh uses special Linux
distribution - [andlinux].

    *andLinux* is complete Ubuntu distribution, that uses coLinux kernel in order to allow
running of Linux inside Windows OS.

[andLinux] http://andlinux.org
[andLinux download page] http://andlinux.org/downloads.php

    How to setup andLinux to work with Fresh?

### Download andLinux

    Download andLinux package from [andLinux download page]

    There are two packages: "KDE version" and "minimal/XFCE version" available. For use with
Fresh IDE, it is not important what version you will choose.

    KDE package is very big and very slow distribution that contains many bundled programs
and tools, but in general you will not need them. The size of KDE package is 500MB download
and 5GB installed.

    The XFCE package is relatively smaller and faster. Relatively means 200MB downloaded file
and 2GB installed on the disk.

    It is obvious that the right choice is to use XFCE package.

### Install andLinux

    Run the downloaded setup file and answer to the different questions of the
setup wizard:

    * coLinux version - choose the stable version (0.7.4 in my case) instead of latest (0.8.0) - 
we shall work with andLinux, not to play.

    * Memory size - 256MB RAM (or maybe more - if you can afford it).

    * Install XMing server on your primary screen.

    * Sound - you can enable or disable sound in Linux option - it is harmless although
it is one more running server.

    * Startup type + Panel - select *"run andLinux automatically as a NT service" + "use Windows shortcuts"*;
It is not very important, but can save you a little manual work and troubles later.

    * andLinux login - just select your user name and password for Linux root.

    * Windows File Access - it is important! - select *"using CoFS"*, no matter it is not recommended.

    * File Access Using CoFS - important! - create one new directory somewhere and select it to
be mount via CoFS. This will be the shared directory, visible from Windows and from Linux in the
same time.

    All other features you can choose freely or simply leave them to default state.

    When you start installation, the installer will try to install network driver.
It is possible Windows will protest and will atempt to mislead you by asking to not install
not certified driver. You must ignore these attempts and firmly click "install".

    After the installation of andLinux, you have to restart Windows and probably andLinux will
not run. :)

    It is because of the Windows firewall. All new network adapters are firewalled
by default. As long as installed adapter (named TAP-Colinux) is virtual and local, you will
not need any firewall, so, go to "Control panel/Windows firewall/Advanced" and uncheck
"TAP-Colinux" adapter from the list of adapters.

    Then you can run some Linux program - in the Windows tray, there is a andLinux menu icon
that have shortcuts to several Linux programs.


### Additional tools

    Install several additional Linux tools. You will need additionally debugger and
some decent terminal emulator. I choose xterm for terminal, because it is small and white by
default. ;)

    You can choose to use the built-in terminal named in the simple Linux manner:
"xfce4-terminal" and console debugger "gdb". In this case you can skip this step because these tools
are already installed.

    Start "Synaptic" - package manager for Ubuntu from the tray menu. You have to enter root
password you choose on install.

    When Synaptic is started, click "Reload" to refresh the package list from the network and
then use search to locate needed programs. I personally recommend "xterm" as terminal and "ddd"
as a GUI front end to "gdb".

    Well, I recommend "ddd" only because it is only Linux debugger that I was able to run under
andLinux and able to show disassembled code of the program.

    Mark selected programs for install, click on Apply button and wait until downloading and
installation.

    Here you can encounter only one problem - your computer is behind a proxy server.

    If the proxy is a normal proxy, you simply have to set its address and port in the
Synaptic preferences and it should work.

    Completely another story is when the proxy is MS ISA server configured with NTLM user
authorization.

    Most Linux programs can't work with such authorization and Synaptic is not an exception.

    Fortunately, there is a workaround of this situation. You need [ntlmaps] authorization 
proxy server.

    Setting up of this server is out of the scope of this article. On the ntlmaps home page you
can read complete documentation and explanations.

    OK, we are ready with andLinux. Now you have working copy of Ubuntu inside your Windows box.

    It's time to configure Fresh to run Linux applications inside andLinux. Continue with:

[ntlmaps] http://ntlmaps.sourceforge.net/

### Fresh IDE configuration.

    Run Fresh and open "Options|IDE options" dialog. Select "Debuggers and Emulators" page.
    Then select following directories and commands:

    * "andLinux directory" - the directory where you installed andLinux.

    * "andLinux shared directory" - shared directory you selected during andLinux installation.

    * "Linux debugger" - Enter "ddd" (or whatever debugger you choose).

    * "Linux terminal" - Enter: `xterm -hold +mesg -e` - the options are important.

    *And voila! You finished the configuration!*

    Now you can load the source of some Linux program (for example "Fresh/examples/XLib/XLib.fpr") and run it with
shift+F9 or load it in the debugger with shift+F8.

    Now Fresh IDE will detect when you compiled ELF executable and will run it in andLinux instead of
Windows. Of course, the Windows applications will be treated as usually.


## Linux setup

    Working in Linux needs Wine installation. Fresh is tested with v1.2 and 1.3 and works great.
You can skip some steps if they are already set.

### Install Wine

    Install Wine - use whatever package manager is good for you. Synaptic is the usual choice.
In the "Wine configuration|Graphics" panel, uncheck the option "Allow the window manager to decorate the windows".

### Install Fresh IDE

    Install Fresh IDE v2.0.5 or newer - you can use the installation package or ZIP
archive. It is better (at least for v2.0.5) to install Fresh IDE in "c:\" but you can put it at
whatever place you like.

### Debuggers

    Install whatever debugger you will use for Linux applications and OllyDbg for
Windows applications. (OllyDbg works in Wine).

    For Linux, personally I prefer [EDB] but any other is OK
including mentioned in the previous chapter "ddd" and "gdb".

[edb] http://www.codef00.com/projects#Debugger

### Configure Fresh IDE

Run Fresh IDE and if some paths are not set properly, set them manually in
"Options|IDE options" menu.

    You will need at least following directory aliases: "Fresh", "finc" and "lib" - set
them respectively to the Fresh main directory, Fresh "include" directory and "freshlib"
directory, both located in Fresh main directory. (As a rule, there is an auto setup of these
paths, so you don't to have to make it manually. But sometimes the algorithm fails)

    In order to work with FASM examples and programs, you can set manually "include" or any
other alias (environment variable) you prefer.

### External tools

    Set the paths to the external tools needed: (of course you have to install these
tools in advance).

    In "Options|IDE options|Debuggers and emulators" leave "andLinux directory" and
"andLinux shared directory" fields empty. (they are not needed when Fresh runs in Linux)

    Set "Linux debugger" field to binary file of the debugger you prefer. Add the needed
options to the line as well.

    For example, in my settings "Linux debugger" is set to `z:/usr/bin/edb --run`

    Set "Linux terminal" to preferred terminal. For example `z:/usr/bin/xterm -hold +mesg`.

    Set "Win32 external debugger" to OllyDbg.
In my case: `c:\Program files\OllyDebugger\Ollydbg.exe`.

    *All paths must be "dos style" (Note: Wine maps Linux file system to device Z: ).*

### Test it

    Try to compile and run some test applications both for Windows and Linux.
Ctrl+F9 should compile, Shift+F9 should run the application and Shift+F8 should load it
in the respective debugger. The type of the application should be automatically determined.

    *IMPORTANT NOTE:* The only known problem (for Fresh v2.0.5) is that the first time compiled
ELF is not set to be executable. So, you will get "access denied" message on run. The workaround
is to set the permissions manually from the file manager, or console "chmod" command. It is one
time procedure - the following compilations, runs and debugs should work properly.
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<














































































































































































































































































































































































































































































Deleted doc/source/1_tips.txt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
Tips and tricks
# Chapter 5 Fresh tips and tricks.

    There is still no full user guide for Fresh.

    I will write in this page some tips and tricks. Later, they will be used in the full user guide.


## Linux support

* How to develop Linux application from Windows?

* How to develop Windows application from Linux?

* What about Fresh IDE version for Linux?

    For now, Fresh is Windows application. The versions 2.x.x will be Windows applications.
The upcoming v3.0.0 is planed to be portable - for Windows and Linux. Of course every port
will be able to create applications for every OS.

    Despite of this situation, even the current versions can work in Linux through Wine.

    There is a separate [setup][setup manual] describing the setup of Fresh IDE for Linux development.

[setup] 0_advanced_setup.htm

## Goto address

    How to find the place in the source where application crashes?

    Press Ctrl+G in order to open "Goto address" dialog:

        [!goto][Goto address]

    Then enter the address where application crashes and press OK. Fresh will show you the
line of the source that is compiled on the given address. The format of the number is standard
FASM number format.


## Code exploration and cross reference.

There are several functions, aimed to provide easy exploration the source of the big project.

### Labels explorer

After compilation, you can browse all labels tree from the labels explorer. Open it from the 
main menu: `Project|Label explorer`.
In the label explorer you can view the lables values, type and where in the source these labels 
was defined and used.

### Editor cross reference

If you position the text caret on some label and press Ctrl+R, or choose from context menu 
"Cross reference", a window with the cross reference information for this label will be open.

          [!cross][Cross reference]

The first row of the table display the line of the source, label was defined. 
Following rows - the lines where the label is used. Besides the pointed label, all its 
children labels will be displayed.s

If you double click on some of the rows, the editor will bring you to this line of source.

The cross reference window can be closed by pressing *Esc* key.

## Goto definition

If you position the text caret on some symbol and press *Ctrl+D* or choose from the context menu 
"Goto definition", the editor will bring you to the line of the source where this symbol were 
defined.
The project must be compiled for this function to work.

## Embeded help

Similar to "Goto definition" is the next function "Embeded help". 

If the definition of some symbol is preceded by comments block, describing this symbol, Fresh can
show this description everywhere in the source, when you position the text caret on the symbol and
press *Ctrl+W* shortcut. 

The hint window looks like on the screenshot:

                    [!_images/EmbededHelp.png][Embeded help system]

The hint window can be closed by pressing *Esc* key.

## Arguments hint

    How to use "Procedure arguments hint"?

        [!procarg][Procedure argument hints]

    This function works with call macros: *stdcall, ccall, invoke and cinvoke*;

    When you type such line and the function is known (the source have to be compiled prior to
that moment) Fresh shows a hint window that helps you to enter the arguments. This happens
automatically when you type "," somewhere in the line.

    When you want to open the hint window without typing (just for check up) you can press
Ctrl+Q when the caret is at the line. You can close the hint window, pressing *Esc* key.

    "Procedure argumens hint" works for the procedures defined in the compiled source and if
the program uses import definitions from FreshLib, for imported functions as well.

## Code completion

    How to type less and to code more?

        [!code][Code completion]

    Code completion is very powerful function, that can save you thousands of keystrokes.
In order to use it, the source have to be compiled (at least partially - some errors are
acceptable) because Fresh have to build the tree of labels, defined in you program.

    Fresh does not use fixed pre-defined lists. It uses the labels you defined in the source.

    This is the only possible solution, because Fresh is not committed to any particular OS or
platform. If you write Win32 program, you need Windows API and constants in the auto completion box.
If you write Linux program, you need all Linux constants and API functions.

    To open code completion, press *Ctrl+Space* shortcut. Then when you type, the content of the box
will be refreshed in order to correspond with the word you type.

    You can select suitable element from the list with *Up, Down, PgUp and PgDn* keys
and press *Enter* to insert it in the line.

    Auto competion list will open automatically when you type some existing in the list label
and then press *"."* in order to select local label.

    You can close the auto completion window with *ESC* key.

## Project categories

    How to move a file from one category to another in the project?

    Open the file in the editor. Select the category where you want it to be moved.
Click the right mouse button on the tab of the file in order to display the context menu.
Select "Add to project" function. It will remove the file from the old category and will
move it to the current selected category.

## Fast open files

    How to open the file specified in "include" or "file" directive?

    Place the caret on the line that contains file name and press Ctrl+Enter.
    Opening files this way, Fresh replace the directory aliases and/or environment 
variables in the filename.

    Files like `%lib%/%TargetOS%/MyFile.asm` will be opened correctly.

## Directory aliases

    How to set promptly the values of the environment variables?

    Fresh have very flexible system of environment variables handling. Actually I prefer to call them
"Aliase", because they are aliases of the directories or other text.

    You can define such aliases in three ways:

* As OS environment variable.

* As IDE-wide aliases in the Fresh IDE options dialog ("Options|IDE options|Aliases" menu).

* As Project-wide aliases in the project options (Project|Project options" menu or Ctrl+F12 shortcut)

    When exists alias defined several times, then the project aliases have higher priority, then
the IDE aliases and with lowest priority are the OS environment variables.

    Now on the topic: The project aliases allows rapid value change. In order to use this
feature, you have to define the alias with several values, delimited with "|" symbol.
The count of different values is not limited.

    One typical example is the alias %TargetOS% used in the FreshLib test project.
This alias may have as values the names of the OS, the project will be compiled for.
So, in order to switch quick from Win32 to Linux we can define TargetOS as "Win32|Linux".

    In this case, at the top of the project window appears popup menu, where you can select what of these
values to be used. Additionally, the panel "Settings:" will display currently selected values.

       [!alias][Alias fast change]

## Portable applications

    How to write portable assembly applications with Fresh IDE?

    Most people think that assembly language can't be portable. It is not the case. There are many
examples that assembly can be portable. It is only matter of the project design, not the language.

    In order to allow creating of portable applications, FreshLib was created. It is work-in-progress,
but even now it allows creating of small applications that can be compiled for Win32 and Linux platforms
without change of the source.

    Read the unfinished [FreshLibRef][FreshLib reference manual]
and check the sources of FreshLib in "Fresh/freshlib" directory of your Fresh IDE installation.
There is a working test project called TestFreshLib.fpr that I use to test different FreshLib features.

Also, you can create FreshLib projects from template engine. Choose "File|New Application" from the 
main menu or press *Shift+Ctrl+N* shortcut. 

In the dialog window, select the target directory and application template - 
"FreshLib portble console application" or "FreshLib portable GUI application", 
then click *"OK"* and new project will be created.

Press *Shift+Ctrl+S* to save all new created files (you can change the filenames to whatever you like).

Then you can compile your application to Linux or Windows executable file.

The latest development version of FreshLib can be downloaded from the [rep][repository] - *FreshLibDev* branch.

[FreshLibRef] 2_FreshLibDoc.htm

[rep] http://chiselapp.com/user/johnfound/repository/FreshIDE/
        
[!goto] _images/goto.png

[!procarg] _images/ProcArgumentsHint.png

[!code] _images/CodeCompletion.png

[!alias] _images/AliasFastChange.png

[!cross] _images/CrossReference.png
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<




























































































































































































































































































































































































































































Deleted doc/source/2_FreshLibDoc.txt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
2848
2849
2850
2851
2852
2853
2854
2855
2856
2857
2858
2859
2860
2861
2862
2863
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883
2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
2894
2895
2896
2897
2898
2899
2900
2901
2902
2903
2904
2905
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
2917
2918
2919
2920
2921
2922
2923
2924
2925
2926
2927
2928
2929
2930
2931
2932
2933
2934
2935
2936
2937
2938
2939
2940
2941
2942
2943
2944
2945
2946
2947
2948
2949
2950
2951
2952
2953
2954
2955
2956
2957
2958
2959
2960
2961
2962
2963
2964
2965
2966
2967
2968
2969
2970
2971
2972
2973
2974
2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
2985
2986
2987
2988
2989
2990
2991
2992
2993
2994
2995
2996
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
3012
3013
3014
3015
3016
3017
3018
3019
3020
3021
3022
3023
3024
3025
3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
3056
3057
3058
3059
3060
3061
3062
3063
3064
3065
3066
3067
3068
3069
3070
3071
3072
3073
3074
3075
3076
3077
3078
3079
3080
3081
3082
3083
3084
3085
3086
3087
3088
3089
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
3108
3109
3110
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
3122
3123
3124
3125
3126
3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
3165
3166
3167
3168
3169
3170
3171
3172
3173
3174
3175
3176
3177
3178
3179
3180
3181
3182
3183
3184
3185
3186
3187
3188
3189
3190
3191
3192
3193
3194
3195
3196
3197
3198
3199
3200
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
3215
3216
3217
3218
3219
3220
3221
3222
3223
3224
3225
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
3266
3267
3268
3269
3270
3271
3272
3273
3274
3275
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295
3296
3297
3298
3299
3300
3301
3302
3303
3304
3305
3306
FreshLib reference
# Chapter 6 FreshLib reference

## Overview 

FreshLib is an assembly library aimed to ease the development of assembly
language applications, freely portable between different platforms,
such as Win32 or Linux.

The library is coded in [FASM][flat assembler] syntax
and is intended to be easily used within [Fresh IDE],
although it could be used for plain FASM development.

The library consists of two layers: one, that is OS dependent and
a second one that is OS independent. The OS dependent layer is very
small, in order to help porting it for different OSes. This layer
only makes interface to the core OS functions, such as memory allocations,
file access, drawing functions, simple window management etc.

The OS independent layer is responsible for the main application functionality
allowing creation of different kind of windows and controls, processing of the
system messages, work with dynamic strings, arrays and other data processing.

FreshLib is mainly intended for developing GUI applications, as they
are the most challenging to be ported across different platforms.
FreshLib is also created with visual programming in mind, so it contains
a flexible, event driven and OS independent template engine allowing
visual creation of application user interfaces.

FreshLib is in early development stage and probably will be changed
many times in order to reach their objectives: to be small, fast and
easy to use.

The main intention is to keep the bloat off the library, but containing
all necessary accessories for comfortable programming of a very wide
range of applications.

The architecture of FreshLib is open and it can be freely expanded
with other libraries without increasing the size of applications.
In other words, only those parts of the library that are effectively
used will be compiled on the final executable.

------------------------------------------------------------------------

## About this manual 

This manual is a "work in progress". Any
part of it can be changed at any time.

Of course, some of the libraries described in this document are more
stable and finished like the macro, system and data libraries. Therefore,
the chapters about these libraries are less likely to be changed.
Other libraries (like graphics and GUI), will be heavily modified
so the manual will be changed accordingly.


------------------------------------------------------------------------

## Structure of the library. 

FreshLib contains many code and macros libraries, structured hierarchically
and depending on each other. Here is shown a part of the library directory tree:
;begin
    freshlib/
        compiler/
            Linux/
            Win32/
        data/
            Linux/
            Win32/
        dialogs/
            Linux/
            Win32/
        ...
;end
The library is structured to support different platforms transparently.
You can see, that the tree consists of main sub-directories, that
contains OS independent libraries, separated by topics.
For example *system* subdirectory contains libraries for accessing system
resources such as memory, files, etc. *data* contains libraries for data
handling and so on.
Every topic directory have also several sub-directories, that contains OS
dependent code these directories are named after the platform they serve. 
(In this moment only Linux and Win32 OSes are supported).

------------------------------------------------------------------------

## Compiler setup for FreshLib use.

You can use any FASM compiler to compile applications that uses FreshLib.
In order to be compiled properly, FreshLib needs environment variables
named `lib` and `TargetOS` to be defined.

The variable `lib` contains the path to the main directory of FreshLib
and the variable `TargetOS` contains the target platform, the application
will be compiled for. The value of `TargerOS` is identical to the name of
OS dependent directories in FreshLib.

There are several ways these variables to be defined, depending on
the compiler you use [FASM], [FASMW] or [Fresh IDE].

1. These variable can be defined in the OS environment - see your OS documentation for details.
this approach is more universal - it works for all kind of FASM compilers. The main drawback is
that you have to use OS specific commands and probably will have to edit some system files.

1. Definition in the section `[environment]` of *"fasm.ini"* or *"Fresh.ini"* file, depending on the
IDE you are using. This approach works for both FASMW and Fresh IDE, but in Fresh, the same
effect can be done from inside the IDE. Besides, defined this way, the environment variables
becomes "global" - active for all projects compiled with FASMW or Fresh.

1. From inside Fresh IDE.

In Fresh IDE, the environment variables are named *alias*,
because they serve to provide short alias for the file paths. Two types of alias (environment variables)
lists are supported by Fresh: *global aliases* and *project aliases*. Global aliases are
defined in the IDE options dialog: *Options|IDE options|Aliases*. Here is the screenshot
of this dialog:

  [!_images/AliasesDlg.png][IDE options dialog, section "Aliases"]

The global aliases are active for every project compiled with Fresh and are stored in the Fresh.ini file,
inside the Fresh program directory.

Project aliases are defined in the Project options dialog: *Project|Project
options* or from the project manager, click on the button *Settings* at the top
and select *Project options*. The project options dialog is shown on the following screenshot:

  [!_images/ProjectOptions.png][Project based aliases can be edited in the project options dialog.]

The project aliases are stored inside the project file (.fpr) and they are project specific.

For FreshLib it is not important what list will be used, but it is
more convenient for `lib` variable to be defined in the global list
and for `TargetOS` variable to be defined in the project aliases. In such way
the common parameter (the place of the library) will be set once for all projects, and
the particular parameter (the target OS) will be set separately for every project.

Also, there is very convenient way of changing the value of project
aliases — if several values are specified in the project alias, separated
with `|` char (for example: `Win32|Linux`), Fresh will provide fast switching
between these values from the project manager options menu, as shown on the picture:

   [!_images/fastswitch.png][The aliases with more than one value will appear in the popup menu for fast changing.]

When Fresh searches for needed alias names, during the compilation, it searches first in the project
aliase list, then the global aliases and at the end, the OS environment variables. Of course, if
the alias is not found on these places, the compilation fails with error.


[FASM] http://flatassembler.net

[FASMW] http://flatassembler.net

[Fresh] http://fresh.flatassembler.net

[Fresh IDE] http://fresh.flatassembler.net

------------------------------------------------------------------------

## FreshLib compiling options 

FreshLib uses some options in order to set the behavior of the compiler and the 
different macro libraries.
These options are defined as a local constants of the label *"options."*
Here is a list:

[#options.FastEnter] `options.FastEnter` controls the behavior of the [#proc] macro.

When *options.FastEnter = 1* the procedure entry/leave code will be created with
faster, but bigger push ebp/pop ebp instructions.

When *options.FastEnter = 0* — enter/leave instructions are used.

[#options.ShowSkipped] `options.ShowSkipped` controls the information displayed during compilation.

When *options.ShowSkipped = 1* the compiler will display in the output window the procedures
that are not compiled because they are not used in the program.

[#options.ShowSizes] `options.ShowSizes` controls the behaviour of the DispSize macro.

When *options.ShowSizes = 0* the macro [#DispSize] will be disabled.

[#options.DebugMode] `options.DebugMode` controls the behaviour of the debug macros.

When *options.DebugMode = 1* the macros from [#simpledebug] library will generate debug code and
debug console will be created on running the application.

When *options.DebugMode = 0* these macros will not generate code and the debug console will not be
created.

------------------------------------------------------------------------

## FreshLib code conventions 

### Naming conventions 

1. The names prefixed with one or more underscores ("_") are not recomended for use by the user.
These are internaly used labels that can be changed later. More underscores in the prefix - 
more "internal" is the given identifier.

For example one underscore ( like this: `_AlmostPrivate`) means - "use it carefully". 

Three underscores (for example `___SomeVeryPrivateLabel`) means - don't use it at all. 
It is for internal use only and will be changed later!

2. The names are considered to be used with code completion editor - i.e. there is no long equal 
prefixes of the names, but there are short "class" prefixes.
For example most of the procedures in StrLib begin with "Str" prefix. 

3. In general, FreshLib uses [CamelCase] naming convention, with constants in lowerCamelCase and 
procedures in HigherCamelCase.

4. All local labels and procedure arguments are prefixed with dot — ".";

5. Almost all of the [#struct] and all of [#object] definitions are prefixed with "T" prefix — for 
example [#TTimer] or [#TButton]

6. The file names convention. All file names with extension `.inc` doesn't contain any code or
data definition. Only macro and constants definitions are permitted.

The files with '.asm' extension can define code and data. Although, the code and data in FreshLib
can exists in the compiled binary, only if they are used. Not used data or code, included in
the binary should be considered a bug.

[CamelCase]  http://en.wikipedia.org/wiki/CamelCase


### Register preserving convention

1. The rule is - preserve all you are using. All procedures in FreshLib preserves all registers, 
except these used for returning result values.

;quote
Note: There is some small retreat from this rule - in the object handling procedures, some register 
are preserved internaly, so the object class procedures may not preserve them. 
See [#object.asm] library for details.

Another exception are the user defined callbacks - FreshLib always preserves the registers before calling
callback procedures and restores after that.
;end

2. CF is widely used for returning error status or boolean result. 
As a rule *CF=0* means no error; *CF=1* means error.

The use of CF is described always in the description of the respective procedures.

3. The procedures can return result in more than one register. As a rule, EAX is the result register 
for the most procedures, but sometimes other registers are used - in order to provide better 
interface for assembly programming.

For example number of procedures return X,Y values in EAX, EDX registers.

* EAX — commonly used for returning 32bit values;

* EDX — second choise - used together with EAX for 64 bit values, or as a second returned value.

* ECX — usually some count. For example if EAX returns pointer to some memory, ECX will contains the 
data size. See [#LoadBinaryFile] for example.

------------------------------------------------------------------------

## Using FreshLib

There are only two files, the user should include in order to use FreshLib. They are both located
in the "freshlib" directory, usually referred by %lib% directory alias.

These files are:

* `%lib%/freshlib.inc` - contains all macro and OS dependent equates definitions.

* `%lib%/freshlib.asm` - contains all code of FreshLib. Only the used code will actually be included
in the result binary.

The minimal application with FreshLib have following code:

;begin
include "%lib%/freshlib.inc"

@BinaryType GUI

include "%lib%/freshlib.asm"

start:
        InitializeAll

        ; Place your code here

        FinalizeAll
        stdcall Terminate, 0

@AllImportEmbeded
@AllDataEmbeded
;end

The macros [#@BinaryType], [#@AllImportEmbeded], [#@AllDataEmbeded] will be explained later in the 
next chapter.

------------------------------------------------------------------------
## Data definitions in FreshLib program

FreshLib uses advanced data definitions macros, that allows data definitions to be mixed with the 
code, but then to be grouped and inserted in the data section of the program.
This kind of definitions greatly improves the readability of the program, because keeps the data
near to the code that uses it. It is especially important on big projects where the code is spreaded
among multiply files.
There are two main types of data - uninitialized and initialized data.

The initialized data definitions are enclosed inside [iglobal], [endg] macros and the uninitialized 
data in [uglobal], [endg] macros. Inside these blocks you can use all FASM data definition directives
or any other valid FASM code. Note only, that in `uglobal` block only the size of the data matters, 
if you use some defined data, the values will be lost.

;begin
iglobal
  MyGlobalVar dd 123, 456, 789
endg

uglobal
  SomeUndefinedVar rd 1
  SomeArray        rb 256
endg

;end

Another useful macro is [text] (actually it is `struc`). It defines some string constant anywhere in
the source code. The string constant is later defined in the data section of the program.

When used data definition macros, the user should define the data section with the macro [@AllDataEmbeded] or
[@AllDataSection] somewhere at appropriate place in the program, where all data should stay.


------------------------------------------------------------------------

## FreshLib directory "compiler/"

This directory contains only one macro library: *"executable.inc"*

### "executable.inc" library 

This library defines the macros that are going to be used for creating the main structure of
the program. The implementation of these macros is OS dependent, as long as the type of the
executable file is OS dependent: PE for Win32 and ELF for Linux. The use of the library, however
is OS independent and is common for all supported OSes.
Depending on the value of `TargetOS` alias, the library will create PE executable or DLL
`(%TargetOS%='Win32')`, or ELF executable or shared library `(%TargetOS%='Linux')`

*NOTE:* Every of the macros from this library must be used only once in the program.

--------------

[#@BinaryType] `macro @BinaryType type`

This macro sets the type of the executable. The argument `type` can accept one of the following
values: `GUI`, `console` or `DLL`.

This macro also begins the main code section of the executable and defines the entry label of the 
program. The entry label is fixed to `start:`.

For example, following code will tell the compiler to compile the program as a GUI application:
;begin
      include '%lib%/freshlib.inc'
      @BinaryType GUI
;end

--------------

[#@AllDataSection] `macro @AllDataSection`
[#@AllDataEmbeded] `macro @AllDataEmbeded`

These macros defines all data definitions in the program, defined in `uglobal` and `iglobal` blocks 
and the text constants defined using `text` macro.

@AllDataSection, defines the data in a separate program "section". The meaning of the "section" 
term is different, depending on TargetOS definition. In Win32 it is `section '.data' data readable writeable`. 
In Linux it is `segment readable writeable`

@AllDataEmbeded defines the data, embeded in the code section of the program (created by @BinaryType macro).

Only one of these macros must be used. Usually @AllDataEmbeded will create smaller executable, although
some negative effects are possible, because embeded data and code is often considered "bad practice"

--------------

[#@AllImportSection] `macro @AllImportSection`
[#@AllImportEmbeded] `macro @AllImportEmbeded`

These macros automatically defines the import section of the program. This section is created
automatically depending on what functions was used in the program. 

Similar to the data section macros, @AllImportSection will create this data as a separate section, 
while @AllImportEmbeded will try to embed this data in the code section. 
(NOTE: In ELF executable format, the import data must be in separate segment, so
on TargetOS=Linux @AllImportSection and @AllImportEmbeded are equal.)

If embeded import section is used together with embeded data section, the import section should be
defined before the data section, because of undefined data definitions, that must reside at the end 
of the section (code section in the case of embeded data).

------------------------------------------------------------------------

## FreshLib directory "equates/"

[#allequates.inc] *"allequates.inc"* library. This library defines all
constants and structures needed for OS dependent parts of FreshLib.

Actually, the user should never use these constants and structures in the portable program.

The constants and structures that the user should use are defined in the respective libraries,
not in *"allequates.inc"*.

This library will be included automatically by "%lib%/freshlib.inc" file, so the user should not care
about this library at all.

------------------------------------------------------------------------

## FreshLib directory "imports/" 

[#allimports.asm]Another directory that contains only OS dependent definitions is *"imports/"* with a library
file to be included in the project: *"allimports.asm"*

This file is automatically included in the [#@AllImportSection] and [#@AllImportEmbeded] macros.
Then it will generate the proper import section, depending on the target platform and functions used 
by the OS dependent parts of FreshLib.

FreshLib contains very big catalog of shared libraries for Windows and decent set of Linux
shared libraries. The import macros used by FreshLib includes in the import section of the
program, only the functions used by the program, so it will never define redundant import items.

The user must never call directly imported functions from inside a portable application, except if
the imported dynamic library is portable as well. (as sqlite3.dll, for example)

------------------------------------------------------------------------

## FreshLib directory "macros/" 

This directory contains several libraries that provides common convenience functions to be
used with big assembly projects. 

All these libraries will be included automatically in "%lib%/freshlib.inc" file.

There is no overhead including all these libraries, because there is no code to be generated, 
just macro definitions. 
There is a little delay in compile time but thanks to fasm's speed, it is barely noticeable.

Lets examine each one of these libraries.

### "_stdcall.inc" library 

In general this library provides ways of definition and invocation of the procedures with
argument passing through the stack. It supports STDCALL and CCALL calling conventions.

-------------------

[#proc] `macro proc name, [arg]`
[#begin] `macro begin`
[#endp] `macro endp`
[#return] `macro return`
[#cret] `macro cret`
[#locals] `macro locals`
[#endl] `macro endl`

These macros define a procedure, create a stack frame for the local variables and define
symbolic names for the arguments. The macro "proc" defines the global label "name" as a name
for the procedure. All arguments and local variables are defined as a local labels with regard
to the name of the procedure. That is why all arguments and local variables must have names
beginning with dot.

Between the line with *proc* and *begin*, any number of local variables can be defined.
The macro *begin* marks the begining of the procedural code.

The macro *endp* marks the end of the procedural code.

The return from procedure instruction is provided by macros *return* or *cret* depending on
the calling convention we want to use: *return* clears the arguments from the stack and *cret*
does not.

Inside the procedure, a secondary stack frame can be allocated with the pair *locals* and *endl*.
All data definitions, enclosed between these macros will define a secondary stack frame that
is a continuation of the stack frame defined between *proc* and *begin*.

Any number of *locals* and *endl* pairs can be used, but all of these secondary stack frames
will overlap between each other. This feature is specially intended to provide savings of stack
space and at the same time, to provide talking variable names for the different parts of more
complex procedures.

For example (in Win32) let we have complex window procedure that have to process
multiple messages. 

One of the message handlers may need one variable `.rect`.

Another message handler may need two variables called `.point1` and `.point2`.

But the procedure as a whole is never going to need all those variables at the
same time, because it process only one message at a time. On the other hand it
may need the variable `.ctrldata` for every message processed. The optimal solution
is to define the variables as shown in the following example:

;begin
    proc CtrlWinProc, .hwnd, .wmsg, .wparam, .lparam
    .ctrldata dd ?
    begin
        invoke GetWindowLong, [.hwnd], GWL_USERDATA
        mov    [.ctrldata], eax

        cmp    [.wmsg], WM_MESSAGE1
        je     .message1
        cmp    [.wmsg], WM_MESSAGE2
        je     .message2
        return

    .message1:
    locals
      .rect RECT
    endl
        ; do something.
        return

    .message2:
    locals
      .point1 POINT
      .point2 POINT
    endl
        ; do something. 
        return
    endp
;end

The assignment of the stack memory for the above example is shown in the table:
;table
(1, 2) Address

(3, 1) Stack frames
;-----------------------
Common

Locals 1

Locals 2
;-----------------------
EBP-20

--

.rect.left

.point1.x
;-----------------------
EBP-16

--

.rect.top

.point1.y
;-----------------------
EBP-12

--

.rect.right

.point2.x
;-----------------------
EBP-8

--

.rect.bottom

.point2.y
;-----------------------
EBP-4

.ctrldata

--

--

;end

;;begin
;;+---------+---------+--------------+-------------+
;;|         |           Stack frames               |
;;+ address +---------+--------------+-------------+
;;|         |  common |   locals 1   |   locals2   |
;;|:-------:|:-------:|:------------:|:-----------:|
;;| EBP-20  |         | .rect.left   |   .point1.x |
;;| EBP-16  |         | .rect.top    |   .point1.y |
;;| EBP-12  |         | .rect.right  |   .point2.x |
;;| EBP-08  |         | .rect.bottom |   .point2.y |
;;| EBP-04  |.ctrldata|              |             |
;;+---------+---------+--------------+-------------+
;;        Procedure local labels memory map.
;;end

As you can see, *.rect* occupies the same memory as *.point1* and *.point2*,
but *.ctrldata* is never overlapped and exists independently.

As a general rule, you have to use the definitions between "proc" and "begin" for local
variables that are used in every call of the procedure and separate locals/endl definitions
for variables needed for the particular branches inside the procedure.
This approach will always provide the optimal size for the locals stack frame.

--------------------

[#initialize] `macro initialize`
[#finalize] `macro finalize`

The macros "initialize" and "finalize" defines one special type of procedures that, during
compilation are registered in a two separate lists - one for "initialize" and one for
"finalize" procedures. Procedures defined with "initialize" and "finalize" must have no any
arguments.

After that, using the macros "InitializeAll" and "FinalizeAll", all these procedures can be
call at once. "initialize" procedures are call in the order of their definition and "finalize"
procedures in reverse order.

These macros provides standard and consistent way for initialization and the
finalization of the libraries and modules of the application.

FreshLib uses this mechanism and the user is free to use it also.


-------------------------


[#stdcall] `macro stdcall proc, [arg]`
[#ccall]   `macro ccall proc, [arg]`
[#invoke]  `macro invoke proc, [arg]`
[#cinvoke] `macro cinvoke proc, [arg]`

These macros call the procedures with STDCALL and CCALL calling convention.

`stdcall` macro pushes the arguments to the stack in reverse order and then call
the procedure with label *proc*. As long as the macro "stdcall" does not provide
any stack cleanup (this duty is assigned to the procedure) the arguments can be
pushed in free manner using, for example, separate push instructions for part of
the arguments and arguments in the stdcall line for the remaining arguments.
This can be very convenient in some cases. For example see the following source:
;begin
    stdcall CreateSomeObject
    push    eax
    stdcall DoSomething
    stdcall DeleteSomeObject
;end
Here, the procedure DoSomething changes the value of eax, so the handle is saved
in the stack. The procedure DeleteSomeObject needs one argument — a handle of
the object. But as long as the proper value is already in the stack, it is
mindless to pop the value and then to push it again. So the source calls
DeleteSomeObject without any arguments. The procedure knows the proper number
of arguments (one in this example) and clean the stack properly.

The standard (and wrong) approach is to pop the argument from the stack and then
to pass it to the procedure explicitly is the stdcall statement:
;begin
    stdcall  CreateSomeObject
    push     eax                ; save the handle.
    stdcall  DoSomething
    pop      eax                ; ??? Why ???
    stdcall  DeleteSomeObject, eax
;end
This source will generate the meaningless instructions sequence:
;begin
    pop      eax
    push     eax
;end
*invoke* macro is the same as "stdcall" with the only difference - it calls the procedure
indirectly ( `call [proc]` instead of `call proc` ).  This mechanism usualy is used to call
the functions imported from external dynamic linked libraries.
Of course, the imported functions can be call with `stdcall [someproc]` but the *invoke*
macro helps to better distinguish what procedures are imported and what are internal for the
program.

*NOTE:* The user should never use *invoke* in the portable programs, because such programs
never use directly OS dependent import functions.


*ccall* macro calls a procedure with CCALL convention. This means that the procedure returns
with simple "retn", without cleaning the parameters from the stack. Then "ccall" macro provides
instructions that remove the arguments from the stack.

Because ccall have to know the exact count of passed arguments, all arguments have to be passed
explicitly as a values in the ccall statement.
Tricks as described above will not work properly and leads to stack not properly cleaned after
the call.

"cinvoke" is the same as ccall, but using indirect call. The reason for existing of "cinvoke"
macro is the same as with "invoke" macro — better legibility of the source.

*About the calling conventions:* While all Win32 dynamic linked libraries uses STDCALL
convention, most (if not all) of Linux libraries uses CCALL convention.

*All code libraries of Fresh use STDCALL calling convention and it is platform independient.*

-------------------------------------------------------------------------------------

### "_globals.inc" library 

This library defines several macros intended to deal with data definitions.

Usually all data definitions have to be placed in special section of the
executable file. ([#_DataSection] in FreshLib). This is not very convenient, because the code
that process this data and the data definitions must reside in separate places of the source
code, and in most cases even in different files.

The idea of *"globals.inc"* macro library is to allow the data definition to be
described anywhere in the source code, but these definitions to be defined at
once, at the place the programmer wants - usually in the data section of the
program.

--------------------

[#uglobal] `macro uglobal`
[#iglobal] `macro iglobal`
[#endg] `macro endg`
[#IncludeAllGlobals] `macro IncludeAllGlobals`

*uglobal* begins block for undefined data definition. The block ends with *endg*
macro. Between "uglobal" and "endg" macro any count of data definitions can be
inserted.

Note that because uglobal block defines undefined data, it is only the labels
and the size of data that have meaning inside this block. Any data, defined with
data definition directive will not increase the size of the executable file, but will
be allocated when the executable is loaded in the memory.

The undefined data will be defined later at the place where "IncludeAllGlobals"
macro resides. In order to not increase the size of the executable file,
the undefined data is always placed at the end of all data definitions.

"iglobal" macro, again combined with "endg" defines initialized data. The data
defined in the block will be created at "IncludeAllGlobals" statement.

This block increases the size of the executable file, because it contains
sensible data, that have to be included in the file.

Actually, neither *uglobal*, nor *iglobal* blocks defines any data immediately.
Instead, the data definitions are stored in a list. The real definition occurs
later, when *IncludeAllGlobals* macro is invoked.
For this reason, *IncludeAllGlobals* must be placed in the source after all used
global data blocks.

The programmer should never use explicitely IncludeAllGlobals. This macro will be invoked
on [#@AllDataSection] or [#@AllDataEmbeded] macros use.

---------------------

[#text] `struc text [val]`

The macro *"text"* is actually a structure. It needs to be preceded by some label name.

This macro accepts string or number arguments. When it is invoked with string arguments,
it defines a zero terminated string constant, and also a local constant *.length*
equal to the length of the string without terminating zero. When invoked with number as argument,
*"text"* defines label at the address *val* and does not defines .length constant.

The *"text"* macro, the same way as *iglobal* and *uglobal*, simply stores string data for
defer definition. This definition, occurs in IncludeAllGlobals invocation.
Note that the real definition will be made only if the string is used in the program.
Not used strings will not be defined.

Look at the following example:
;begin
        myName text 'John',$20,'Smith'
;end
This code will define the data and constant labels following way:
;begin
        if used myName
          myName db 'John Smith'
          .length = $-myName
                 db 0
        end if
;end

Why to define separate macro for the strings and not to use the normal iglobal
block? At first, *text* macro defines a real data only if this data is used
somewhere in the source. This way is prevented bloating of the code with
unneeded data definitions.

;quote
Also, the macro "text" was planned to check the strings
content and to not define any string more than once. In the case of repetitive strings,
this macro should return the pointer to the already defined string constant.

In that case, it would be very convenient and harmless to use unnamed string constants
in the function calling macros - stdcall, ccall etc.

Unfortunately, regardless of the power of fasm macro language, this functionality
cannot be implemented. Or, more precisely, it can be implemented, but the
implementation is too slow for any real project use.

This ineffective implementation is still leaved inside the file *"_globals.inc"* -
commented block that defines macro with name "*InitStringList*". If someone have
ideas about fixing this problem, please send it to me!
;end

------------------

[#var] `macro var expr`

The macro *var* defines dword variable with a given value. The use is following:
;begin
        var MyVar = 1234
;end
The only differens from the usual use of *dd* directive is that the variable will be
defined only if used in the source.

Note that the variable is created at the place where *var* is used, so you need to place it
inside a *iglobal* block if you want it to be defined in the global data place.

---------------------

### "_struct.inc" library 

This library contains only two simple macros:

[#struct] `macro struct name`
[#ends] `ends`

*struct* macro is aimed to provide easy creation of data structures. The "struc"
directive in FASM is known to not create actual labels, but only the template
for the label definitions. So, we need to create an instance of the data
structure in order to have addresses and offsets of its fields.

But very often we don't have static data structure of the given type, but data
structure, pointed by some of the registers. In this case in order to use
offsets to the fields of the data structure, we need to define at least one
virtual instance of the structure at address 0. Then we can use the values of
the fields as an offsets in the instructions - for example:
;begin
    mov eax, [esi+RECT.right].
;end
So, this is exactly what "struct" macro does. Besides it defines the "struc"
structure with the given fields, it creates a single virtual instance of this
structure, in order to be used later for register addressing.
Also, the macro defines local constant of *sizeof.* global label equal to the
byte size of the structure.
In all remaining functionality it behaves exactly as the struc directive.

The syntax of struct macro is the following:
;begin
    struct StructureName
      .field1 dd ?
      .field2 RECT
      .fieldN:
    ends
;end
The definition begins with "struct" followed by the structure name.
The definition ends with "ends" directive. Between both, any local label
definition becomes a member of the structure.
The above definition, results in following code:
;begin
    struc StructureName {
      .field1 dd ?
      .field2 RECT
      .fieldN:
    }
    virtual at 0
      StructureName StructureName
      sizeof.StructureName = $
    end virtual
;end

--------------------------------------------------------------

### "_display.inc" library 

This library contains macros that enhance the functionality of standard FASM
"display" directive.


[#disp] `macro disp [arg]`

The macro "disp" displays the strings given in the arguments, just as "display"
FASM directive does. Additionally it can display numbers in any given radix:
;begin
    disp <number, radix>
;end

---------------

[#DispSize] `macro DispSize Text, Sz`

"DispSize" is very specialized macro, that displays the text and number in
the following form:
;begin
Size of [Text] is: Sz bytes
;end
The size number is automatically scaled to bytes or kbytes, depending on the
value of Sz.

This macro allows easy display and control of the sizes of particular areas of
the program - data structures, subroutines etc.

DispSize macro behavior is controlled by [#options.ShowSizes] option.

-----------------

[#display] How Fresh implements "display" directive

There are some specifics in Fresh, concerning message displaying. The "display"
directive in Fresh works in a slightly different way than the original FASM directive.

It outputs text in Fresh message window. Each message can have one of six icons,
or it can have no icon at all. And because message window is implemented as a
TreeView control, you can organize your messages into groups (directories).

Implementation is a bit "tricky" - when you display a character whose code is
less than 16, it is interpreted in a special way. Characters from 1 to 6 set an
icon of current message. It sounds complicated, but it is quite simple. Try:
;begin
    display 2, "some message"
;end
It will display "some message" with an error icon. Another codes are used for
controlling directory structure. Try to type following lines and see what would
happen:
;begin
    display 3, "message at root", 0x09
    display 3, "child message1", 0x0a
    display 3, "child message2", 0x0d
    display 3, "again at root", 0x0a
;end
Of course you don't have to put each message in separate display directive, you
can, with the same result write:

display 3, "at root",$09,3,"child1",$0a,3,"child2", $0d,3,"again at root",$0a

Here is the complete list of all special characters and their meanings:
;table
char

meaning

note
;-------------
$01

set current icon to "warning"

 [?_images/warning.gif]
;------------------
$02

set current icon to "error"

 [?_images/error.gif]
;-----------------
$03

set current icon to "info"

 [?_images/information.gif]
;----------------
$04

set current icon to "find"

 [?_images/find.gif]
;----------------
$05

set current icon to "none"

--
;----------------
$06

set current icon to "debug" 

 [?_images/debug.gif]
;----------------
$08 

end current row and set one level back.

--
;----------------
$09

end current row and set it as new directory.

--
;----------------
$0a

end current row and keep current level.

--
;----------------
$0d

end current row and set current level to root level.

--
;end


;;begin
;;+------+--------------------------------------------------------+
;;| char |   meaning                                              |
;;|:----:|:-------------------------------------------------------|
;;| $01  |   set current icon to "warning"                        |
;;| $02  |   set current icon to "error"                          |
;;| $03  |   set current icon to "info"                           |
;;| $04  |   set current icon to "find"                           |
;;| $05  |   set current icon to "none"                           |
;;| $06  |   set current icon to "debug"                          |
;;| $08  |   end current row and set one level back.              |
;;| $09  |   end current row and set it as new directory.         |
;;| $0a  |   end current row and keep current level               |
;;| $0d  |   end current row and set current level to root level  |
;;+------+--------------------------------------------------------+
;;          display directive special characters
;;end


-------------------


## FreshLib directory "system/" 


-----------------


### "memory.asm" library 

This library provides OS independent way of allocating, reallocating and freeing
dynamic memory blocks.
All other libraries in FreshLib that needs dynamic memory, use this library.

The user who needs such memory blocks should use it as well.

--------------

[#GetMem] `proc GetMem, .size`

Allocates `[.size]` byte of dynamic memory.

Returns:

 `CF=0`; EAX = pointer to the allocated memory;

 `CF=1`; EAX=0 if the memory can not be allocated.

The memory is filled with NULL.

------------------

[#FreeMem] `proc FreeMem, .ptr`

Frees the specified in `[.ptr]` dynamically allocated memory.
Returns nothing.

-------------------

[#ResizeMem] `proc ResizeMem, .ptr, .newsize`

Reallocates memory on address `[.ptr]` to the new size in `[.newsize]`

Returns:

CF=0; EAX = pointer to the allocated memory;

CF=1; EAX=`[.ptr]` if the memory can not be reallocated.
In this case, the memory block is not changed

The increased part of the memory block is not zeroed.

-----------------


### "files.asm" library 


[#FileOpen] `proc FileOpen, .filename`

The procedure opens the file with filename in `[.filename]` for reading.

Returns:

`CF=0`; EAX = Handle to the file.

`CF=1`; EAX = Error code.

-------------------

[#FileCreate] `proc FileCreate, .filename`

Creates a file or opens the existing one and truncates its size to 0. The file is opened for writing.

Returns:

`CF=0`; EAX = Handle to the file.

`CF=1`; EAX = Error code.


----------------------

[#FileClose] `proc FileClose, .handle`

Closes the previously opened file.

Returns:

`CF=0`; EAX = Handle to the file.
`CF=1`; EAX = Error code.

----------------------

[#FileRead] `proc FileRead, .handle, .buffer, .count`

Reads `[.count]` bytes from the file `[.handle]` in the buffer at `[.buffer]`.

Returns:

`CF=0`; EAX = The count of actually read bytes.
`CF=1`; EAX = Error code.

---------------------

[#FileWrite] `proc FileWrite, .handle, .buffer, .count`

Writes `[.count]` bytes from the buffer `[.buffer]` to the file with handle `[.handle]`.

Returns:

`CF=0`; EAX = The count of actually written bytes.

`CF=1`; EAX = Error code.

------------------------

[#FileSeek] `proc FileSeek, .handle, .dist, .direction`

Moves the file pointer of the file `[.handle]` on `[.dist]` distance (in bytes) relative to `[.direction]`.

Direction is one of the following values:

`fsFromBegin` — relative to the file begin.

`fsFromEnd` — relative to the file end (then `[.dist]` should be negative).

`fsFromCurrent` — relative to the current position.

[#FileDelete] `proc FileDelete, .filename`

Deletes the file with filename in `[.filename]`

Returns:

`CF=0`; The file was deleted.
`CF=1`; EAX = Error code.

--------------------------

[#GetErrorString] `proc GetErrorString, .code`

Returns in EAX, pointer to the human readable error message, corresponding to the error code
passed in `[.code]`

The message string have to be passed to FreeErrorString, when not needed.

---------------------------

[#FreeErrorString] `proc FreeErrorString, .ptrString`

Frees the error string `[.ptrString]`, previously returned by GetErrorString. As long as the
error strings are allocated by the OS, they have to be free by OS as well.
Returns nothing.

--------------------------

[#LoadBinaryFile] `proc LoadBinaryFile, .ptrFileName`

Loads the whole file `[.ptrFileName]` to the dynamically allocated memory block.

Returns:

`CF=0`; EAX = pointer to the allocated memory; ECX = the size of the loaded file.

`CF=1`; EAX = Error code. ECX = 0; The memory is not allocated.

The allocated memory have to be free after use with [#FreeMem].

---------------------------

[#SaveBinaryFile] `proc SaveBinaryFile, .ptrFileName, .aptr, .size`

Creates or overwrites the file `[.ptrFileName]` with the `[.size]` bytes from the buffer `[.aptr]`;

Returns:

`CF=0`; EAX = count of the bytes actually write;
`CF=1`; EAX = error code;

--------------------------

[#FileExists] `proc FileExists, .ptrFileName`

Check the existence of the file with name in `[.ptrFileName]`.

Returnds:

`CF=1` — the file *does not* exists.
`CF=0` — the file exists.

The existence of the file is checked using [#FileOpen] procedure.
If the file can be opened, it is considered existing.

-----------------

### "process.asm" library 


[#Terminate] `proc Terminate, .exit_code`

Terminates the application and all of its threads.
Returns `[.exit_code]` to the OS.

This procedure simply does not returns, because the application stops.

----------------------

[#ThreadCreate] `proc ThreadCreate, .ptr_to_function, .ptr_to_args`

Creates new thread. `[.ptr_to_function]` points to the thread procedure.
The thread procedure should have one argument.
When the thread starts, `[.ptr_to_args]` is passed as a thread argument.

Returns:

`CF=0`; EAX = is a handle to the new thread. In the different OSes this value can have different meaning.
But it identifies the thread anyway.

`CF=1`; EAX = error code;

------------------------

[#MutexCreate] `proc MutexCreate, .ptrName, .ptrMutex`

Creates new mutex with name `[.ptrName]` and save its handle to `[.ptrMutex]` variable.

The calling thread takes the owneship of the mutex.

If `[.ptrName]` = 0, unnamed mutex will be created.

------------------------

[#WaitForMutex] `proc WaitForMutex, .ptrMutex, .timeout`

Waits until the mutex is released and takes the ownership.

Returns:

`CF=0` — the mutex ownership is successfuly obtained.
`CF=1` — the timeout was expired.


[#MutexRelease] `proc MutexRelease, .ptrMutex`

Releases the ownership of the specified mutex.

-----------------------

[#MutexDestroy] proc MutexDestroy, .ptrMutex

Destroys the mutex `[.ptrMutex]`

-----------------

### "clipboard.asm" library 

[#clipboard.asm] clipboard.asm library contains very simple clipboard functions that 
works only on text data.

------------------

[#ClipboardRead] `proc ClipboardRead`

Returns in EAX handle to the string with the current clipboard content.
If the clipboard is empty or contains not textual information, EAX=0;
The user should delete the string when not needed by passing it to [#StrDel].

--------------------

[#ClipboardWrite] `proc ClipboardWrite, .hstring`

Writes the string `[.hstring]` to the clipboard.
Returns nothing.

-----------------

## FreshLib directory "timers/" 


### "timers.asm" library 

[#timers.asm] library deals with user created timers and also contains some 
procedures for work with the system time and date.

------------------------------------

[#TTimer] `TTimer` structure.

The timers in FreshLib are represented with the following memory structure:
;begin
    struct TTimer
      .next dd ?

      .interval dd ?
      .value    dd ?


      .flags    dd ?
      .Callback dd ?
      .Expired  dd ?
      .tag      dd ?
    ends
;end
The fields are:

  * `.next` —  Don't change this. It is a pointer to the next timer in the timers chain. It is for internal use only.

  * `.interval` — the interval of the time in ms

  * `.value` — The current value of the timer in ms. When this value becomes higher than `[.interval]` an event is 
fired and the value becomes 0. This value is incremented by the system dependent time step - probably something like 1..100ms

  * `.flags` — contains a set of tmfXXXX flag values. Determines the behavior of the timer. See below for description of the flags.

  * `.Callback` — pointer to the callback procedure of the timer.
                  The callback procedure should accept one argument with the pointer to the timer that fired the event: `proc OnTimer, .ptrTimer`

  * `.Expired`  —  count of the timer expirations, if the callback procedure was not called.

  * `.tag`  —  user defined value associated with the timer.


The `.flags` field can have one or more of the following values:

 * `tmfDoNothing`  — when the timer expires no action should be performed. .Expired field of the timer will be incremented.

 * `tmfCallProc` —   `[TTimer.Callback]` contains pointer to the procedure that to be executed once per timer expiration.

 * `tmfSyncDestroy` — If this flag is set, the timer will be destroyed on the next timer expiration.
                      In this case, the configured event is fired and then the timer is destroyed.
                      The flag is checked after the event returns, so the event handler can reset this flag and thus to prevent destruction.

 * `tmfRunning` — If this flag is set, the timer runs. If the event handler resets this flag, the timer will fire only once and will be suspended.

-----------------------

[#TimerCreate] `proc TimerCreate`

Creates a new timer.

Returns:

 * CF=0; EAX= pointer to the TTimer structure. The timer is created suspended.
         The user can set or reset tmfRunning in `[.flags]` in order to start or stop the timer.
         Also, the user have to enter proper values in the remaining fields.

 * CF=1; Error allocating memory.

---------------------- 

[#TimerDestroy] `proc TimerDestroy, .ptrTimer`

Destroys the timer `[.ptrTimer]`

------------------


## FreshLib directory "simpledebug/"

### "debug.asm" library 

[#simpledebug] This library includes number or macros and procedures aimed to assist the debugging
process of the application. These macros display different data values on the debugging
console.

The library contains its own output procedures, so it does not depend on the other used libraries.

All the macros from this library generate code only when [#options.DebugMode] = 1, so the
programmer can include as many debug statements as needed and leave them in the source.
They will not be included in the final binary.
The debug macros will always preserve all registers, except the EFLAGS register.

-------------

[#DebugMsg] `macro DebugMsg msg`

Displays the text message `msg` to the debug console.
Example:
;begin
     DebugMsg 'The program executes here!'
;end

----------------

[#OutputRegister] `macro OutputRegister reg, radix`

Outputs the content of some register in the given radix. Example:
;begin
    OutputRegister regEAX, 10
;end
The possible register values are: `regEDI`, `regESI`, `regEBP`, `regESP`, `regEBX`, `regEDX`, `regECX`, `regEAX`

----------------

[#OutputMemory] `macro OutputMemory pointer, size`

OutputMemory will dump `[size]` bytes of memory at address `[pointer]`; Example:
;begin
    OutputMemory esi, 128
;end

---------------

[#OutputNumber] `macro OutputNumber number, radix, digits`

Outputs *digits* digits of the *number* in *radix* radix. Example:
;begin
    OutputNumber 12345, 16, 8
;end

------------------

[#GetTimestamp] `proc GetTimestamp`

Returns in eax timestamp measured in milliseconds (ms).

-----------------


## FreshLib directory "data/" 

This directory contains several libraries that handles different data structures.
The libraries are mostly OS independent.
Actually the only OS dependent part is one small routine in Win32 section, that
converts strings from UTF-8 to UTF-16 because Windows can't handle UTF-8 strings
directly.

------------------

### "arrays.asm" library 

This library handles dynamic arrays, containing elements of arbitrary size.
All elements of the array have the same size.

[#TArray] TArray structure have following definition:
;begin
    struct TArray
      .count     dd ?
      .capacity  dd ?
      .itemsize  dd ?
      .lparam    dd ?
      label .array dword
    ends
;end
The above structure represents the header of the array. The actual array will
have arbitrary size, depending on the element count and size.

The first element of the array begins on offset *TArray.array* from the begining
of the memory block.

The field *TArray.count* contains the current element count of the array.

The field *TArray.capacity* contains the current capacity of the array.
It is because the library usually allocates more memory than is needed for the array
element count.
This approach reduces memory allocations and reallocations and
thus increases the speed of inserting and deleting elements in the array.
How many memory will be allocated depends on the user setting of the variable
ResizeIt (defined in memory.asm). This variable contains a pointer to the
procedure that simply increases the value of ECX to the next suitable value.

The field *TArray.itemsize* contains the size in bytes of one array element.
Changing of this value is not recommended.

The field *TArray.lparam* is for user defined parameter, associated with the array.

-----------------------------

[#CreateArray] `proc CreateArray, .itemSize`

This procedure creates new array with item size `[.ItemSize]`

The procedure returns CF=0 if the array is properly created and pointer to the
array is placed in EAX.

In case the memory cannot be allocated, the procedure returns CF=1.
To free the allocated array, use [#FreeMem] procedure.

------------------------------

[#AddArrayItems] `proc AddArrayItems, .ptrArray, .count`

This procedure adds new array items at the end of the array pointed by `[.ptrArray]`

The procedure returns two values:

EAX contains pointer to the first of the new appended elements.

EDX contains pointer to the array (in the process of appending of the new
element, it is possible the whole array to be moved to the new address in
memory, so the programmer should store the value of EDX for the future
reference to the array.

In case, the new memory can not be allocated, the procedure returns CF=1 and
EDX contains the proper pointer to the original array.

----------------------------------

[#InsertArrayItems] `proc InsertArrayItems, .ptrArray, .iElement, .count`

This procedure inserts `[.count]` new elements at the `[.iElement]` position of the
array pointed by `[.ptrArray]`

If `[.iElement]` is larger or equal to `[TArray.count]` the elements are appended
at the end of the array. (Just like AddArrayItems) Otherwise, all elements are
moved to make room for the new elements.

The procedure returns exactly the same results as AddArrayItems procedure — EDX
points to the array and EAX points to the first of the new inserted elements.

CF is an error flag.

---------------------------------

[#GetArrayItem] `proc GetArrayItem, .array, .item`

CF=0; Returns in EAX pointer to the array item with index `[.item]`.

CF=1; The requested item does not exists ( `[.item]` >= `[.array.count]` ). In this case,
EAX contains pointer to the end of the array the byte next after the last array element.

---------------------------------

[#DeleteArrayItems] `proc DeleteArrayItems, .ptrArray, .iElement, .count`

This procedure deletes `[.count]` items with begin index `[.iElement]` the `[.ptrArray]`
dynamic array. If the capacity of the array is bigger than the recommended for
the new count, then the array is resized. The recommended size is calculated
using ResizeIt procedure from memory library.

Returns EDX - pointer to the TArray. In the most cases this pointer will not be
changed, but this also depends on the current OS memory allocation API, so it
is safer to store the pointer for future use, instead of one passed to the
procedure.

This procedure can not fail, because deleting element is always possible. In
some rare cases it can fail to reallocate smaller memory block, but this is
not a problem for the array consistency.