Package com.googlecode.javaewah32
Class EWAHCompressedBitmap32
- java.lang.Object
-
- com.googlecode.javaewah32.EWAHCompressedBitmap32
-
- All Implemented Interfaces:
LogicalElement<EWAHCompressedBitmap32>
,BitmapStorage32
,java.io.Externalizable
,java.io.Serializable
,java.lang.Cloneable
,java.lang.Iterable<java.lang.Integer>
public final class EWAHCompressedBitmap32 extends java.lang.Object implements java.lang.Cloneable, java.io.Externalizable, java.lang.Iterable<java.lang.Integer>, BitmapStorage32, LogicalElement<EWAHCompressedBitmap32>
This implements the patent-free EWAH scheme. Roughly speaking, it is a 32-bit variant of the BBC compression scheme used by Oracle for its bitmap indexes.
In contrast with the 64-bit EWAH scheme (javaewah.EWAHCompressedBitmap), you can expect this class to compress better, but to be slower at processing the data. In effect, there is a trade-off between memory usage and performances.
- Since:
- 0.5.0
- See Also:
The objective of this compression type is to provide some compression, while reducing as much as possible the CPU cycle usage. For more details, see the following paper: Daniel Lemire, Owen Kaser, Kamel Aouiche, Sorting improves word-aligned bitmap indexes. Data Knowledge Engineering 69 (1), pages 3-28, 2010. http://arxiv.org/abs/0901.3751
, Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static boolean
adjustContainerSizeWhenAggregating
whether we adjust after some aggregation by adding in zeroesstatic boolean
usetrailingzeros
optimization optionstatic int
wordinbits
The Constant wordinbits represents the number of bits in a int.
-
Constructor Summary
Constructors Constructor Description EWAHCompressedBitmap32()
Creates an empty bitmap (no bit set to true).EWAHCompressedBitmap32(int buffersize)
Sets explicitly the buffer size (in 32-bit words).
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(int newdata)
Adding words directly to the bitmap (for expert use).void
add(int newdata, int bitsthatmatter)
Adding words directly to the bitmap (for expert use).void
addStreamOfEmptyWords(boolean v, int number)
For experts: You want to add many zeroes or ones? This is the method you use.void
addStreamOfLiteralWords(int[] data, int start, int number)
if you have several literal words to copy over, this might be faster.void
addStreamOfNegatedLiteralWords(int[] data, int start, int number)
Same as addStreamOfLiteralWords, but the words are negated.EWAHCompressedBitmap32
and(EWAHCompressedBitmap32 a)
Returns a new compressed bitmap containing the bitwise AND values of the current bitmap with some other bitmap.static EWAHCompressedBitmap32
and(EWAHCompressedBitmap32... bitmaps)
Returns a new compressed bitmap containing the bitwise AND values of the provided bitmaps.int
andCardinality(EWAHCompressedBitmap32 a)
Returns the cardinality of the result of a bitwise AND of the values of the current bitmap with some other bitmap.static int
andCardinality(EWAHCompressedBitmap32... bitmaps)
Returns the cardinality of the result of a bitwise AND of the values of the provided bitmaps.EWAHCompressedBitmap32
andNot(EWAHCompressedBitmap32 a)
Returns a new compressed bitmap containing the bitwise AND NOT values of the current bitmap with some other bitmap.int
andNotCardinality(EWAHCompressedBitmap32 a)
Returns the cardinality of the result of a bitwise AND NOT of the values of the current bitmap with some other bitmap.void
andNotToContainer(EWAHCompressedBitmap32 a, BitmapStorage32 container)
Returns a new compressed bitmap containing the bitwise AND NOT values of the current bitmap with some other bitmap.void
andToContainer(EWAHCompressedBitmap32 a, BitmapStorage32 container)
Computes new compressed bitmap containing the bitwise AND values of the current bitmap with some other bitmap.static void
andWithContainer(BitmapStorage32 container, EWAHCompressedBitmap32... bitmaps)
For internal use.static EWAHCompressedBitmap32
bitmapOf(int... setbits)
Return a bitmap with the bit set to true at the given positions.int
cardinality()
reports the number of bits set to true.void
clear()
Clear any set bits and set size in bits back to 0EWAHCompressedBitmap32
clone()
void
deserialize(java.io.DataInput in)
Deserialize.boolean
equals(java.lang.Object o)
Check to see whether the two compressed bitmaps contain the same set bits.boolean
get(int i)
Query the value of a single bit.EWAHIterator32
getEWAHIterator()
Gets an EWAHIterator over the data.IteratingRLW32
getIteratingRLW()
java.util.List<java.lang.Integer>
getPositions()
get the locations of the true values as one vector.int
hashCode()
Returns a customized hash code (based on Karp-Rabin).boolean
intersects(EWAHCompressedBitmap32 a)
Return true if the two EWAHCompressedBitmap have both at least one true bit in the same position.IntIterator
intIterator()
Iterator over the set bits (this is what most people will want to use to browse the content if they want an iterator).java.util.Iterator<java.lang.Integer>
iterator()
iterate over the positions of the true values.void
not()
Negate (bitwise) the current bitmap.EWAHCompressedBitmap32
or(EWAHCompressedBitmap32 a)
Returns a new compressed bitmap containing the bitwise OR values of the current bitmap with some other bitmap.static EWAHCompressedBitmap32
or(EWAHCompressedBitmap32... bitmaps)
Returns a new compressed bitmap containing the bitwise OR values of the provided bitmaps.int
orCardinality(EWAHCompressedBitmap32 a)
Returns the cardinality of the result of a bitwise OR of the values of the current bitmap with some other bitmap.static int
orCardinality(EWAHCompressedBitmap32... bitmaps)
Returns the cardinality of the result of a bitwise OR of the values of the provided bitmaps.void
orToContainer(EWAHCompressedBitmap32 a, BitmapStorage32 container)
Computes the bitwise or between the current bitmap and the bitmap "a".static void
orWithContainer(BitmapStorage32 container, EWAHCompressedBitmap32... bitmaps)
For internal use.void
readExternal(java.io.ObjectInput in)
void
serialize(java.io.DataOutput out)
Serialize.int
serializedSizeInBytes()
Report the size required to serialize this bitmapboolean
set(int i)
Set the bit at position i to true, the bits must be set in (strictly) increasing order.void
setSizeInBits(int size)
Set the size in bits.boolean
setSizeInBits(int size, boolean defaultvalue)
Change the reported size in bits of the *uncompressed* bitmap represented by this compressed bitmap.int
sizeInBits()
Returns the size in bits of the *uncompressed* bitmap represented by this compressed bitmap.int
sizeInBytes()
Report the *compressed* size of the bitmap (equivalent to memory usage, after accounting for some overhead).void
swap(EWAHCompressedBitmap32 other)
swap the content of the bitmap with another.int[]
toArray()
Populate an array of (sorted integers) corresponding to the location of the set bits.java.lang.String
toDebugString()
A more detailed string describing the bitmap (useful for debugging).java.lang.String
toString()
A string describing the bitmap.void
trim()
Reduce the internal buffer to its minimal allowable size (given by this.actualsizeinwords).void
writeExternal(java.io.ObjectOutput out)
EWAHCompressedBitmap32
xor(EWAHCompressedBitmap32 a)
Returns a new compressed bitmap containing the bitwise XOR values of the current bitmap with some other bitmap.static EWAHCompressedBitmap32
xor(EWAHCompressedBitmap32... bitmaps)
Returns a new compressed bitmap containing the bitwise XOR values of the provided bitmaps.int
xorCardinality(EWAHCompressedBitmap32 a)
Returns the cardinality of the result of a bitwise XOR of the values of the current bitmap with some other bitmap.void
xorToContainer(EWAHCompressedBitmap32 a, BitmapStorage32 container)
Computes a new compressed bitmap containing the bitwise XOR values of the current bitmap with some other bitmap.static void
xorWithContainer(BitmapStorage32 container, EWAHCompressedBitmap32... bitmaps)
For internal use.
-
-
-
Field Detail
-
usetrailingzeros
public static final boolean usetrailingzeros
optimization option- See Also:
- Constant Field Values
-
adjustContainerSizeWhenAggregating
public static final boolean adjustContainerSizeWhenAggregating
whether we adjust after some aggregation by adding in zeroes- See Also:
- Constant Field Values
-
wordinbits
public static final int wordinbits
The Constant wordinbits represents the number of bits in a int.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
EWAHCompressedBitmap32
public EWAHCompressedBitmap32()
Creates an empty bitmap (no bit set to true).
-
EWAHCompressedBitmap32
public EWAHCompressedBitmap32(int buffersize)
Sets explicitly the buffer size (in 32-bit words). The initial memory usage will be "buffersize * 32". For large poorly compressible bitmaps, using large values may improve performance.- Parameters:
buffersize
- number of 32-bit words reserved when the object is created)
-
-
Method Detail
-
add
public void add(int newdata)
Adding words directly to the bitmap (for expert use). This is normally how you add data to the array. So you add bits in streams of 4*8 bits. Example: if you add 321, you are have added (in binary notation) 0b101000001, so you have effectively called set(0), set(6), set(8) in sequence.- Specified by:
add
in interfaceBitmapStorage32
- Parameters:
newdata
- the word
-
add
public void add(int newdata, int bitsthatmatter)
Adding words directly to the bitmap (for expert use).- Parameters:
newdata
- the wordbitsthatmatter
- the number of significant bits (by default it should be 32)
-
addStreamOfLiteralWords
public void addStreamOfLiteralWords(int[] data, int start, int number)
if you have several literal words to copy over, this might be faster.- Specified by:
addStreamOfLiteralWords
in interfaceBitmapStorage32
- Parameters:
data
- the literal wordsstart
- the starting point in the arraynumber
- the number of literal words to add
-
addStreamOfEmptyWords
public void addStreamOfEmptyWords(boolean v, int number)
For experts: You want to add many zeroes or ones? This is the method you use.- Specified by:
addStreamOfEmptyWords
in interfaceBitmapStorage32
- Parameters:
v
- the boolean valuenumber
- the number
-
addStreamOfNegatedLiteralWords
public void addStreamOfNegatedLiteralWords(int[] data, int start, int number)
Same as addStreamOfLiteralWords, but the words are negated.- Specified by:
addStreamOfNegatedLiteralWords
in interfaceBitmapStorage32
- Parameters:
data
- the literal wordsstart
- the starting point in the arraynumber
- the number of literal words to add
-
and
public EWAHCompressedBitmap32 and(EWAHCompressedBitmap32 a)
Returns a new compressed bitmap containing the bitwise AND values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()). If you are not planning on adding to the resulting bitmap, you may call the trim() method to reduce memory usage.- Specified by:
and
in interfaceLogicalElement<EWAHCompressedBitmap32>
- Parameters:
a
- the other bitmap- Returns:
- the EWAH compressed bitmap
-
andToContainer
public void andToContainer(EWAHCompressedBitmap32 a, BitmapStorage32 container)
Computes new compressed bitmap containing the bitwise AND values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()).- Parameters:
a
- the other bitmapcontainer
- where we store the result- Since:
- 0.4.0
-
andCardinality
public int andCardinality(EWAHCompressedBitmap32 a)
Returns the cardinality of the result of a bitwise AND of the values of the current bitmap with some other bitmap. Avoids needing to allocate an intermediate bitmap to hold the result of the OR.- Parameters:
a
- the other bitmap- Returns:
- the cardinality
-
andNot
public EWAHCompressedBitmap32 andNot(EWAHCompressedBitmap32 a)
Returns a new compressed bitmap containing the bitwise AND NOT values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()). If you are not planning on adding to the resulting bitmap, you may call the trim() method to reduce memory usage.- Specified by:
andNot
in interfaceLogicalElement<EWAHCompressedBitmap32>
- Parameters:
a
- the other bitmap- Returns:
- the EWAH compressed bitmap
-
andNotToContainer
public void andNotToContainer(EWAHCompressedBitmap32 a, BitmapStorage32 container)
Returns a new compressed bitmap containing the bitwise AND NOT values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()).- Parameters:
a
- the other bitmapcontainer
- where we store the result
-
andNotCardinality
public int andNotCardinality(EWAHCompressedBitmap32 a)
Returns the cardinality of the result of a bitwise AND NOT of the values of the current bitmap with some other bitmap. Avoids needing to allocate an intermediate bitmap to hold the result of the OR.- Parameters:
a
- the other bitmap- Returns:
- the cardinality
-
cardinality
public int cardinality()
reports the number of bits set to true. Running time is proportional to compressed size (as reported by sizeInBytes).- Returns:
- the number of bits set to true
-
clear
public void clear()
Clear any set bits and set size in bits back to 0
-
clone
public EWAHCompressedBitmap32 clone() throws java.lang.CloneNotSupportedException
- Overrides:
clone
in classjava.lang.Object
- Throws:
java.lang.CloneNotSupportedException
-
deserialize
public void deserialize(java.io.DataInput in) throws java.io.IOException
Deserialize.- Parameters:
in
- the DataInput stream- Throws:
java.io.IOException
- Signals that an I/O exception has occurred.
-
equals
public boolean equals(java.lang.Object o)
Check to see whether the two compressed bitmaps contain the same set bits.- Overrides:
equals
in classjava.lang.Object
- See Also:
Object.equals(java.lang.Object)
-
getEWAHIterator
public EWAHIterator32 getEWAHIterator()
Gets an EWAHIterator over the data. This is a customized iterator which iterates over run length word. For experts only.- Returns:
- the EWAHIterator
-
getIteratingRLW
public IteratingRLW32 getIteratingRLW()
- Returns:
- the IteratingRLW iterator corresponding to this bitmap
-
getPositions
public java.util.List<java.lang.Integer> getPositions()
get the locations of the true values as one vector. (may use more memory than iterator())- Returns:
- the positions
-
hashCode
public int hashCode()
Returns a customized hash code (based on Karp-Rabin). Naturally, if the bitmaps are equal, they will hash to the same value.- Overrides:
hashCode
in classjava.lang.Object
-
intersects
public boolean intersects(EWAHCompressedBitmap32 a)
Return true if the two EWAHCompressedBitmap have both at least one true bit in the same position. Equivalently, you could call "and" and check whether there is a set bit, but intersects will run faster if you don't need the result of the "and" operation.- Parameters:
a
- the other bitmap- Returns:
- whether they intersect
-
intIterator
public IntIterator intIterator()
Iterator over the set bits (this is what most people will want to use to browse the content if they want an iterator). The location of the set bits is returned, in increasing order.- Returns:
- the int iterator
-
iterator
public java.util.Iterator<java.lang.Integer> iterator()
iterate over the positions of the true values. This is similar to intIterator(), but it uses Java generics.- Specified by:
iterator
in interfacejava.lang.Iterable<java.lang.Integer>
- Returns:
- the iterator
-
not
public void not()
Negate (bitwise) the current bitmap. To get a negated copy, do EWAHCompressedBitmap x= ((EWAHCompressedBitmap) mybitmap.clone()); x.not(); The running time is proportional to the compressed size (as reported by sizeInBytes()).- Specified by:
not
in interfaceLogicalElement<EWAHCompressedBitmap32>
-
or
public EWAHCompressedBitmap32 or(EWAHCompressedBitmap32 a)
Returns a new compressed bitmap containing the bitwise OR values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()). If you are not planning on adding to the resulting bitmap, you may call the trim() method to reduce memory usage.- Specified by:
or
in interfaceLogicalElement<EWAHCompressedBitmap32>
- Parameters:
a
- the other bitmap- Returns:
- the EWAH compressed bitmap
-
orToContainer
public void orToContainer(EWAHCompressedBitmap32 a, BitmapStorage32 container)
Computes the bitwise or between the current bitmap and the bitmap "a". Stores the result in the container.- Parameters:
a
- the other bitmapcontainer
- where we store the result
-
orCardinality
public int orCardinality(EWAHCompressedBitmap32 a)
Returns the cardinality of the result of a bitwise OR of the values of the current bitmap with some other bitmap. Avoids needing to allocate an intermediate bitmap to hold the result of the OR.- Parameters:
a
- the other bitmap- Returns:
- the cardinality
-
readExternal
public void readExternal(java.io.ObjectInput in) throws java.io.IOException
- Specified by:
readExternal
in interfacejava.io.Externalizable
- Throws:
java.io.IOException
-
serialize
public void serialize(java.io.DataOutput out) throws java.io.IOException
Serialize.- Parameters:
out
- the DataOutput stream- Throws:
java.io.IOException
- Signals that an I/O exception has occurred.
-
serializedSizeInBytes
public int serializedSizeInBytes()
Report the size required to serialize this bitmap- Returns:
- the size in bytes
-
get
public boolean get(int i)
Query the value of a single bit. Relying on this method when speed is needed is discouraged. The complexity is linear with the size of the bitmap. (This implementation is based on zhenjl's Go version of JavaEWAH.)- Parameters:
i
- the bit we are interested in- Returns:
- whether the bit is set to true
-
set
public boolean set(int i)
Set the bit at position i to true, the bits must be set in (strictly) increasing order. For example, set(15) and then set(7) will fail. You must do set(7) and then set(15).- Parameters:
i
- the index- Returns:
- true if the value was set (always true when i is greater or equal to sizeInBits()).
- Throws:
java.lang.IndexOutOfBoundsException
- if i is negative or greater than Integer.MAX_VALUE - 32
-
setSizeInBits
public void setSizeInBits(int size)
Set the size in bits. This does not change the compressed bitmap.- Specified by:
setSizeInBits
in interfaceBitmapStorage32
- Parameters:
size
- number of bits
-
setSizeInBits
public boolean setSizeInBits(int size, boolean defaultvalue)
Change the reported size in bits of the *uncompressed* bitmap represented by this compressed bitmap. It may change the underlying compressed bitmap. It is not possible to reduce the sizeInBits, but it can be extended. The new bits are set to false or true depending on the value of defaultvalue.- Parameters:
size
- the size in bitsdefaultvalue
- the default boolean value- Returns:
- true if the update was possible
-
sizeInBits
public int sizeInBits()
Returns the size in bits of the *uncompressed* bitmap represented by this compressed bitmap. Initially, the sizeInBits is zero. It is extended automatically when you set bits to true.- Specified by:
sizeInBits
in interfaceLogicalElement<EWAHCompressedBitmap32>
- Returns:
- the size in bits
-
sizeInBytes
public int sizeInBytes()
Report the *compressed* size of the bitmap (equivalent to memory usage, after accounting for some overhead).- Specified by:
sizeInBytes
in interfaceLogicalElement<EWAHCompressedBitmap32>
- Returns:
- the size in bytes
-
toArray
public int[] toArray()
Populate an array of (sorted integers) corresponding to the location of the set bits.- Returns:
- the array containing the location of the set bits
-
toDebugString
public java.lang.String toDebugString()
A more detailed string describing the bitmap (useful for debugging).- Returns:
- the string
-
toString
public java.lang.String toString()
A string describing the bitmap.- Overrides:
toString
in classjava.lang.Object
- Returns:
- the string
-
swap
public void swap(EWAHCompressedBitmap32 other)
swap the content of the bitmap with another.- Parameters:
other
- bitmap to swap with
-
trim
public void trim()
Reduce the internal buffer to its minimal allowable size (given by this.actualsizeinwords). This can free memory.
-
writeExternal
public void writeExternal(java.io.ObjectOutput out) throws java.io.IOException
- Specified by:
writeExternal
in interfacejava.io.Externalizable
- Throws:
java.io.IOException
-
xor
public EWAHCompressedBitmap32 xor(EWAHCompressedBitmap32 a)
Returns a new compressed bitmap containing the bitwise XOR values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()). If you are not planning on adding to the resulting bitmap, you may call the trim() method to reduce memory usage.- Specified by:
xor
in interfaceLogicalElement<EWAHCompressedBitmap32>
- Parameters:
a
- the other bitmap- Returns:
- the EWAH compressed bitmap
-
xorToContainer
public void xorToContainer(EWAHCompressedBitmap32 a, BitmapStorage32 container)
Computes a new compressed bitmap containing the bitwise XOR values of the current bitmap with some other bitmap. The running time is proportional to the sum of the compressed sizes (as reported by sizeInBytes()).- Parameters:
a
- the other bitmapcontainer
- where we store the result
-
xorCardinality
public int xorCardinality(EWAHCompressedBitmap32 a)
Returns the cardinality of the result of a bitwise XOR of the values of the current bitmap with some other bitmap. Avoids needing to allocate an intermediate bitmap to hold the result of the OR.- Parameters:
a
- the other bitmap- Returns:
- the cardinality
-
andWithContainer
public static void andWithContainer(BitmapStorage32 container, EWAHCompressedBitmap32... bitmaps)
For internal use. Computes the bitwise and of the provided bitmaps and stores the result in the container.- Parameters:
container
- where the result is storedbitmaps
- bitmaps to AND
-
and
public static EWAHCompressedBitmap32 and(EWAHCompressedBitmap32... bitmaps)
Returns a new compressed bitmap containing the bitwise AND values of the provided bitmaps. It may or may not be faster than doing the aggregation two-by-two (A.and(B).and(C)). If only one bitmap is provided, it is returned as is. If you are not planning on adding to the resulting bitmap, you may call the trim() method to reduce memory usage.- Parameters:
bitmaps
- bitmaps to AND together- Returns:
- result of the AND
-
andCardinality
public static int andCardinality(EWAHCompressedBitmap32... bitmaps)
Returns the cardinality of the result of a bitwise AND of the values of the provided bitmaps. Avoids needing to allocate an intermediate bitmap to hold the result of the AND.- Parameters:
bitmaps
- bitmaps to AND- Returns:
- the cardinality
-
bitmapOf
public static EWAHCompressedBitmap32 bitmapOf(int... setbits)
Return a bitmap with the bit set to true at the given positions. The positions should be given in sorted order. (This is a convenience method.)- Parameters:
setbits
- list of set bit positions- Returns:
- the bitmap
- Since:
- 0.4.5
-
orWithContainer
public static void orWithContainer(BitmapStorage32 container, EWAHCompressedBitmap32... bitmaps)
For internal use. Computes the bitwise or of the provided bitmaps and stores the result in the container.- Parameters:
container
- where store the resultbitmaps
- to be aggregated
-
xorWithContainer
public static void xorWithContainer(BitmapStorage32 container, EWAHCompressedBitmap32... bitmaps)
For internal use. Computes the bitwise xor of the provided bitmaps and stores the result in the container.- Parameters:
container
- where store the resultbitmaps
- to be aggregated
-
or
public static EWAHCompressedBitmap32 or(EWAHCompressedBitmap32... bitmaps)
Returns a new compressed bitmap containing the bitwise OR values of the provided bitmaps. This is typically faster than doing the aggregation two-by-two (A.or(B).or(C).or(D)). If only one bitmap is provided, it is returned as is. If you are not planning on adding to the resulting bitmap, you may call the trim() method to reduce memory usage.- Parameters:
bitmaps
- bitmaps to OR together- Returns:
- result of the OR
-
xor
public static EWAHCompressedBitmap32 xor(EWAHCompressedBitmap32... bitmaps)
Returns a new compressed bitmap containing the bitwise XOR values of the provided bitmaps. This is typically faster than doing the aggregation two-by-two (A.xor(B).xor(C).xor(D)). If only one bitmap is provided, it is returned as is. If you are not planning on adding to the resulting bitmap, you may call the trim() method to reduce memory usage.- Parameters:
bitmaps
- bitmaps to XOR together- Returns:
- result of the XOR
-
orCardinality
public static int orCardinality(EWAHCompressedBitmap32... bitmaps)
Returns the cardinality of the result of a bitwise OR of the values of the provided bitmaps. Avoids needing to allocate an intermediate bitmap to hold the result of the OR.- Parameters:
bitmaps
- bitmaps to OR- Returns:
- the cardinality
-
-