Может C # оператора генерировать ноны подключенного MSIL

Обновить

April 2019

Просмотры

196 раз

7

Речь идет о C # спецификации языка и спецификации CIL языка , а также компании Microsoft и C # поведения компилятора Моно.

Я строю некоторые инструменты анализа кода (независимо от того, что), которые работают на КСС.

Учитывая несколько примеров коды, я заметил, что заявления коды (попытаться / поймать, IfElse, ifthen, петлю, ...) генерировать подключенные блоки MSIL.

Но я хотел бы, чтобы убедиться , что я не могу писать C # код конструкции , которое дает Неподключённый MSIL. Более конкретно, я могу писать C # заявление , которое переводится как (что - то подобное):

IL_0000: 
IL_0001: 
IL_0002: 

// hole

IL_001a: 
IL_001b:

Я уже пробовал некоторые странные вещи , используя gotoи вложенные циклы, но , может быть , я не так безумен , как некоторые пользователи будут.

2 ответы

1

In theory yes (this comes from my experience) . Your analysis tool does not deal with c# directly, but works on IL code only. IL can be produced by anybody, not only by visual studio, but also by other language compilers like visual basic, python. Net... and obfuscators! Obfuscators are the real culprit:while other compilers try to adhere to the specs, obfuscators do their best to exploit the specs and the target runtime.

Obfuscated code might violate certain common sense patterns. Consider this case: certain smart obfuscators produce illegal msil, but the jitter digest it because it happens that the invalid portions are in the end not executed.

When building an analysis tool, you can't handle these cases unless your target is to build a deobfuscator.

9

Sure, that's trivially possible. Something like:

static void M(bool x)
{
    if (x)
        return;
    else
        M(x);
    return;
}

If you compile that in debug mode you get

    IL_0000: nop
    IL_0001: ldarg.0
    IL_0002: stloc.0
    IL_0003: ldloc.0
    IL_0004: brfalse.s IL_0008
    IL_0006: br.s IL_0011
    IL_0008: ldarg.0
    IL_0009: call void A::M(bool)
    IL_000e: nop
    IL_000f: br.s IL_0011
    IL_0011: ret

The if statement goes from 0001 to 0009, and the consequence of the if is a goto to 0011; both return statements are the same code, so there is a "hole" containing a nop and an unconditional branch between the main body of the if and the consequence.

More generally, you should never assume anything whatsoever about the layout of the IL produced by the C# compiler. The compiler makes no guarantees whatsoever other than that the IL produced will be legal and, if safe, verifiable.


You say you are writing some code analysis tools; as the author of significant portions of the C# analyzer, and someone who worked on third-party analysis tools at Coverity, a word of advice: for the majority of questions you typically want answered about C# programs, the parse tree produced by Roslyn is the entity you wish to analyze, not the IL. The parse tree is a concrete syntax tree; it is one-to-one with every character in the source code. It can be very difficult to map optimized IL back to the original source code, and it can be very easy to produce false positives in an IL analysis.

Put another way: source-to-IL is semantics-preserving but also information-losing; you typically want to analyze the artifact that has the most information in it.