- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Greetings,

I have been trying to use ICC for vectorization but have been having some serious problems.

Even in very simple situations the compiler does not vectorize code that, in principle, seems

straightforward to vectorize. In most cases the vectorization report produces mysterious and

(to me useless) remarks. Please see the test case below:

[cpp]#includevoid initialize(float * A, float * b, size_t Size) { for (size_t row_index = 0 ; row_index < Size ; ++row_index) { float * row = A + row_index * Size ; for (size_t col = 0 ; col < Size ; ++col) { row[col] = 1.0f / float(col + 1) ; } b[row_index] = row_index ; } } void matVec(float const * A, float const * x, float * b, size_t Size) { for (size_t row_index = 0 ; row_index < Size ; ++row_index) { float const * row = A + row_index * Size ; float row_accumulator = 0 ; // Vectorizes for (size_t col = 0 ; col < Size ; ++col) { row_accumulator += row[col] * x[col] ; } b[row_index] = row_accumulator ; } } // The Tonly difference from matVec is that row_index is an int. void matVec2(float const * A, float const * x, float * b, size_t Size) { for (int row_index = 0 ; row_index < Size ; ++row_index) { float const * row = A + row_index * Size ; float row_accumulator = 0 ; // Does not vectorize: // remark: loop was not vectorized: dereference too complex. // If I compile with -vec-report=3 I get a bunch of weird remarks // regarding flow dependence between row_accumulator and itself. // Does this have to do with some unrolling of the outer loop? for (unsigned col = 0 ; col < Size ; ++col) { row_accumulator += row[col] * x[col] ; } b[row_index] = row_accumulator ; } } int main() { size_t const Size = 256 ; float * A, * b, * x ; // I wanted memory aligned to 16 bytes but I am not really // even getting to that. posix_memalign((void**)&A, Size * Size, 16) ; posix_memalign((void**)&b, Size * Size, 16) ; posix_memalign((void**)&x, Size * Size, 16) ; initialize(A, b, Size) ; matVec(A, x, b, Size) ; return 0 ; } [/cpp]

matVec vectorizes well. However matVec2 does not. The only difference between matVec and

matVec2is the type used for the row_index. Looking at the vectorization report I get very cryptic

messages. I even get messages saying that the matVec loop was not vectorized ... then that it was

vectorized:

simple.cpp(60): (col. 5) remark: loop was not vectorized: not inner loop.

simple.cpp(60): (col. 5) remark: loop was not vectorized: unsupported data type.

simple.cpp(61): (col. 5) remark: loop was not vectorized: not inner loop.

simple.cpp(61): (col. 5) remark: loop was not vectorized: existence of vector dependence.

simple.cpp(61): (col. 5) remark: vector dependence: assumed ANTI dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed FLOW dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed FLOW dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed ANTI dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed ANTI dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed FLOW dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed FLOW dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed ANTI dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: loop was not vectorized: not inner loop.

simple.cpp(61): (col. 5) remark: LOOP WAS VECTORIZED.

simple.cpp(14): (col. 5) remark: loop was not vectorized: not inner loop.**simple.cpp(20): (col. 9) remark: loop was not vectorized: existence of vector dependence.**

simple.cpp(21): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(14): (col. 5) remark: loop was not vectorized: not inner loop.**simple.cpp(20): (col. 9) remark: LOOP WAS VECTORIZED.**

simple.cpp(31): (col. 5) remark: loop was not vectorized: not inner loop.

simple.cpp(41): (col. 9) remark: loop was not vectorized: existence of vector dependence.

simple.cpp(42): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(31): (col. 5) remark: loop was not vectorized: not inner loop.**simple.cpp(42): (col. 32) remark: loop was not vectorized: dereference too complex.**

simple.cpp(4): (col. 5) remark: loop was not vectorized: not inner loop.

simple.cpp(6): (col. 9) remark: loop was not vectorized: unsupported data type.

These messages don't tell me anything about the "row_index int".

I am compiling this code with:

icpc (ICC) 11.1 20090630

Copyright (C) 1985-2009 Intel Corporation. All rights reserved.

icpc simple.cpp -O3 -xW -fp-model fast -o simple -vec-report=3

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

My personal preference would be to compile with -fno-inline-functions so as to clear up as much vectorization as possible before dealing with in-lining.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

1) Unexpected confusing dependence message on the sum-reduction variable is a result of an optimizer bug. A bug report has been submitted. In the mean time, you can use

#pragma unroll_and_jam(0)

for (int row_index = 0; ....)

to get round that bug.

2) In general, address computation that involves 64bit integers/pointers and 32bit unsigned integers is rather difficult for the compiler to deal with.With respect to integral conversions, language definition is more relaxedfor thesigned types (by saying implementation defined when the value exceeds the range), and that difference can result in getting the code optimized or not. If you write the inner loop of matVec2() as in

for (int col = 0; ....)

row_accumulator += A[row_index * Size + col] *x[col]

and change the type of Size to "int", compiler should be able to auto-vectorize it.

We are continuously improving our analysis so that we can capture as much as the language definition allows.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Greetings,

I have been trying to use ICC for vectorization but have been having some serious problems.

Even in very simple situations the compiler does not vectorize code that, in principle, seems

straightforward to vectorize. In most cases the vectorization report produces mysterious and

(to me useless) remarks. Please see the test case below:

[cpp]#includevoid initialize(float * A, float * b, size_t Size) { for (size_t row_index = 0 ; row_index < Size ; ++row_index) { float * row = A + row_index * Size ; for (size_t col = 0 ; col < Size ; ++col) { row[col] = 1.0f / float(col + 1) ; } b[row_index] = row_index ; } } void matVec(float const * A, float const * x, float * b, size_t Size) { for (size_t row_index = 0 ; row_index < Size ; ++row_index) { float const * row = A + row_index * Size ; float row_accumulator = 0 ; // Vectorizes for (size_t col = 0 ; col < Size ; ++col) { row_accumulator += row[col] * x[col] ; } b[row_index] = row_accumulator ; } } // The Tonly difference from matVec is that row_index is an int. void matVec2(float const * A, float const * x, float * b, size_t Size) { for (int row_index = 0 ; row_index < Size ; ++row_index) { float const * row = A + row_index * Size ; float row_accumulator = 0 ; // Does not vectorize: // remark: loop was not vectorized: dereference too complex. // If I compile with -vec-report=3 I get a bunch of weird remarks // regarding flow dependence between row_accumulator and itself. // Does this have to do with some unrolling of the outer loop? for (unsigned col = 0 ; col < Size ; ++col) { row_accumulator += row[col] * x[col] ; } b[row_index] = row_accumulator ; } } int main() { size_t const Size = 256 ; float * A, * b, * x ; // I wanted memory aligned to 16 bytes but I am not really // even getting to that. posix_memalign((void**)&A, Size * Size, 16) ; posix_memalign((void**)&b, Size * Size, 16) ; posix_memalign((void**)&x, Size * Size, 16) ; initialize(A, b, Size) ; matVec(A, x, b, Size) ; return 0 ; } [/cpp]

matVec vectorizes well. However matVec2 does not. The only difference between matVec and

matVec2is the type used for the row_index. Looking at the vectorization report I get very cryptic

messages. I even get messages saying that the matVec loop was not vectorized ... then that it was

vectorized:

simple.cpp(60): (col. 5) remark: loop was not vectorized: not inner loop.

simple.cpp(60): (col. 5) remark: loop was not vectorized: unsupported data type.

simple.cpp(61): (col. 5) remark: loop was not vectorized: not inner loop.

simple.cpp(61): (col. 5) remark: loop was not vectorized: existence of vector dependence.

simple.cpp(61): (col. 5) remark: vector dependence: assumed ANTI dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed FLOW dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed FLOW dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed ANTI dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed ANTI dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed FLOW dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed FLOW dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: vector dependence: assumed ANTI dependence between (unknown) line 61 and (unknown) line 61.

simple.cpp(61): (col. 5) remark: loop was not vectorized: not inner loop.

simple.cpp(61): (col. 5) remark: LOOP WAS VECTORIZED.

simple.cpp(14): (col. 5) remark: loop was not vectorized: not inner loop.**simple.cpp(20): (col. 9) remark: loop was not vectorized: existence of vector dependence.**

simple.cpp(21): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(21): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 21 and row_accumulator line 21.

simple.cpp(14): (col. 5) remark: loop was not vectorized: not inner loop.**simple.cpp(20): (col. 9) remark: LOOP WAS VECTORIZED.**

simple.cpp(31): (col. 5) remark: loop was not vectorized: not inner loop.

simple.cpp(41): (col. 9) remark: loop was not vectorized: existence of vector dependence.

simple.cpp(42): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed FLOW dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(42): (col. 13) remark: vector dependence: assumed ANTI dependence between row_accumulator line 42 and row_accumulator line 42.

simple.cpp(31): (col. 5) remark: loop was not vectorized: not inner loop.**simple.cpp(42): (col. 32) remark: loop was not vectorized: dereference too complex.**

simple.cpp(4): (col. 5) remark: loop was not vectorized: not inner loop.

simple.cpp(6): (col. 9) remark: loop was not vectorized: unsupported data type.

These messages don't tell me anything about the "row_index int".

I am compiling this code with:

icpc (ICC) 11.1 20090630

Copyright (C) 1985-2009 Intel Corporation. All rights reserved.

icpc simple.cpp -O3 -xW -fp-model fast -o simple -vec-report=3

Try using explicit calls of pragma's (like distribution, unroll_and_jam, etc.) rather trusting completely what -O3 does no doubt -O3 does some minimal auto-vectorization, check the compilation process by having a log file & also check objdump afterwards how SSE inst. are behaving. Sometime, compiler geneates structural dependencies which can't be too convincing to a programmer so best is to check the objdump.

Out of curiosity why are you using "-fp-model fast", could you try combination of "-fp-model" specially "-fp-model precise" and analyze your performance if gained.

Does use of -xW also helping youto any kind?

~BR

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

The issue is resolved in Intel Composer XE 16.0.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page