Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1516950 - Data inconsistency for CMR when connection is broken on oracle.jdbc.ReadTimeout Timeout after database resource commits
Summary: Data inconsistency for CMR when connection is broken on oracle.jdbc.ReadTimeo...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: JCA
Version: 6.4.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: jboss-set
QA Contact: Jiří Bílek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-23 16:24 UTC by Radek Rodak
Modified: 2017-12-19 09:56 UTC (History)
6 users (show)

Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1181132
Environment:
Last Closed: 2017-12-19 09:56:56 UTC
Type: Bug


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
JBoss Issue Tracker JBJCA-1244 Major Closed LocalXAResourceImpl returns RBROLLBACK in certain scenarios which violate the XA specification 2018-06-26 06:56:06 UTC
JBoss Issue Tracker WFLY-4272 Major Closed Data inconsistency for CMR when connection is broken after database resource commits 2018-06-26 06:56:06 UTC

Description Radek Rodak 2017-11-23 16:24:06 UTC
+++ This bug was initially created as a clone of Bug #1181132 +++


--- Additional comment from Radek Rodak on 2017-11-21 07:35:33 EST ---

Hi Tom

You asked for more codes with CMR.

When I have put oracle.jdbc.ReadTimeout Property on CMR Datasource, I saw this Exceptions, when ReadTimeout has closed JDBC Connection:

What do you think, is this expected behaviour:

<datasource jta="true" jndi-name="java:/jdbc/DS" pool-name="DS" enabled="true" use-java-context="true" connectable="true" statistics-enabled="true">
                    <connection-url>jdbc:oracle:thin:@Test14:1521:TEST1</connection-url>
                    <datasource-class>oracle.jdbc.OracleDriver</datasource-class>
                    <connection-property name="oracle.jdbc.ReadTimeout">
                        5000
                    </connection-property>
                    <connection-property name="oracle.net.CONNECT_TIMEOUT">
                        5000
                    </connection-property>
                    <driver>OracleJDBCDriver</driver>
                    <pool>
                        <min-pool-size>2</min-pool-size>
                        <max-pool-size>10</max-pool-size>
                        <prefill>true</prefill>
                        <use-strict-min>true</use-strict-min>
                        <flush-strategy>FailingConnectionOnly</flush-strategy>
                    </pool>
                    <security>
                        <security-domain>EncryptedPasswordDS</security-domain>
                    </security>
                    <validation>
                        <valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleValidConnectionChecker"/>
                        <validate-on-match>false</validate-on-match>
                        <background-validation>true</background-validation>
                        <background-validation-millis>900000</background-validation-millis>
                        <use-fast-fail>false</use-fast-fail>
                        <stale-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleStaleConnectionChecker"/>
                        <exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleExceptionSorter"/>
                    </validation>
                    <timeout>
                        <query-timeout>180</query-timeout>
                    </timeout>
                    <statement>
                        <prepared-statement-cache-size>10</prepared-statement-cache-size>
                        <share-prepared-statements>true</share-prepared-statements>
                    </statement>
                </datasource>
	
By forcing ReadTimeout after 5s I saw this Exceptions : 	
		    

2017-11-20 21:16:35,047 WARN  [org.jboss.jca.core.connectionmanager.listener.TxConnectionListener] (Thread 448_639930413) IJ000305: Connection error occured: org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@324201c8[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@37ea6d52 connection handles=1 lastUse=1511208182337 trackByTx=true pool=org.jboss.jca.core.connectionmanager.pool.strategy.PoolBySubject@5282118e pool internal context=SemaphoreArrayListManagedConnectionPool@7b06cd7[pool=DS] xaResource=LocalXAResourceImpl@8271568[connectionListener=324201c8 connectionManager=387ff920 warned=false currentXid=< formatId=131077, gtrid_length=36, bqual_length=36, tx_uid=0:ffff0aa207ec:-237e2b6:5a12e7e9:cfce2d, node_name=nodeApp1, branch_uid=0:ffff0aa207ec:-237e2b6:5a12e7e9:cfce2f, subordinatenodename=null, eis_name=java:/jdbc/DS > productName=Oracle productVersion=Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options jndiName=java:/jdbc/DS] txSync=null]: java.sql.SQLRecoverableException: IO Error: Socket read timed out
        at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:967)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1150)
        at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4798)
        at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:4875)
        at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:1361)
        at org.jboss.jca.adapters.jdbc.CachedPreparedStatement.executeUpdate(CachedPreparedStatement.java:114)
        at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.executeUpdate(WrappedPreparedStatement.java:493)
	
2017-11-20 21:16:35,059 WARN  [com.arjuna.ats.jta] (Thread 448_639930413) ARJUNA016045: attempted rollback of < formatId=131077, gtrid_length=36, bqual_length=36, tx_uid=0:ffff0aa207ec:-237e2b6:5a12e7e9:cfce2d, node_name=nodeApp1, branch_uid=0:ffff0aa207ec:-237e2b6:5a12e7e9:cfce2f, subordinatenodename=null, eis_name=java:/jdbc/DS > (LocalXAResourceImpl@8271568[connectionListener=324201c8 connectionManager=387ff920 warned=false currentXid=null productName=Oracle productVersion=Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options jndiName=java:/jdbc/DS]) failed with exception code XAException.XAER_RMFAIL: org.jboss.jca.core.spi.transaction.local.LocalXAException: IJ001160: Could not rollback local transaction
        at org.jboss.jca.core.tx.jbossts.LocalConnectableXAResourceImpl.rollback(LocalConnectableXAResourceImpl.java:111)
        at com.arjuna.ats.internal.jta.resources.arjunacore.CommitMarkableResourceRecord.topLevelAbort(CommitMarkableResourceRecord.java:453)
        at com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:2901)
        at com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:2880)
        at com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1645)
        at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:118)
        at com.arjuna.ats.arjuna.AtomicAction.abort(AtomicAction.java:186)
        at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.rollbackAndDisassociate(TransactionImple.java:1313)
        at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.rollback(BaseTransaction.java:143)
        at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.rollback(BaseTransactionManagerDelegate.java:114)
        at org.jboss.tm.usertx.client.ServerVMClientUserTransaction.rollback(ServerVMClientUserTransaction.java:202)
	
2017-11-20 21:16:35,067 WARN  [com.arjuna.ats.arjuna] (Thread 448_639930413) ARJUNA012091: Top-level abort of action 0:ffff0aa207ec:-237e2b6:5a12e7e9:cfce2d received TwoPhaseOutcome.FINISH_ERROR from com.arjuna.ats.internal.jta.resources.arjunacore.CommitMarkableResourceRecord

--- Additional comment from  on 2017-11-23 10:28:33 EST ---

Hi Radek,

That does look like an issue because the DB may have rolled back or not and IronJacamar could not infer the outcome. 

Please can you assign to JCA and report the issue against all affected products.

Thanks,
Tom

--- Additional comment from  on 2017-11-23 10:31:44 EST ---

However, I don't think your simulated error would be in the right place actually. Because you need that timeout to happen when the LocalConnectableXAResourceImpl is issuing the commit call - which I don't think it is in your java.sql.SQLRecoverableException: IO Error: Socket read timed out

--- Additional comment from Radek Rodak on 2017-11-23 10:57:48 EST ---

I agree with you. In this case it will fail before 2-end Phase. But I think Periodic Recovery might have problem with recovering, what is left in object_store...

Comment 1 Ondrej Chaloupka 2017-12-14 11:14:55 UTC
Hi Radek,

Tom asked me to investigate on this issue and I have few questions to check with you.

I have created some reproducer, trying to understand the issue but I'm not 100% successful. I do not understand exact scenario that causes the data inconsistency. Would you be so kind and elaborate a bit more on it?

Referring to your post at the https://bugzilla.redhat.com/show_bug.cgi?id=1181132

If I understand you could see the issue at two cases, am I right? Those are what you says at bz#1181132

* 'We are hitting exactly this case with CMR on Oracle Datasource, where we are getting RMFAIL when Oracle Database is shutdown with "shutdown immediate"'
* 'I have put oracle.jdbc.ReadTimeout Property on CMR Datasource, I saw this Exceptions, when ReadTimeout has closed JDBC Connection'

Maybe they are two different issues (each for each case)? 

For the second case - is that that during SQL update/insert execution the read timeout is lower than the time needed to execute and the execution is rejected. Then you get 'java.sql.SQLRecoverableException: IO Error: Socket read timed out', which results to global transaction is set up 'TwoPhaseOutcome.FINISH_ERROR'. Here from my testing here the whole transactions with all resources is aborted. Which is correct. There are only some errors in the log.
Do you have experience that the IBM MQ is (in this read timeout case) committed?

If I understand right the case where Oracle Database is shutdown with "shutdown immediate", the behavior is as expectd (heuristic rollback is reported, manual resolution is needed but that's what is the CMR capability to do). And you resolved the issue of need to manually resolve branch by using XA, right?

Thank you for you input
Ondra

Comment 2 Radek Rodak 2017-12-14 14:29:11 UTC
Hi Ondrej

Only thing need to be done is just to add Error Code Number to private static final List<String> RMFAIL_SQL_ERRORCODES in core/src/main/java/org/jboss/jca/core/tx/jbossts/LocalConnectableXAResourceImpl.java for java.sql.SQLRecoverableException: IO Error: Socket read timed out. ( you need to check which number it is )

This should be good enough, because it works already for Oracle: 08000 (No more data to read from socket)

With CRM, Global Transaction in 2-end Phase needs to end up with  XAException.XAER_RMFAIL with CMR Ressource and not automatic Rollback, because it can also be, that Oracle was able commit short before connection is gone.

Analog what Tom did here   https://github.com/ironjacamar/ironjacamar/pull/278/commits/ff62b8b23f59f9fbb9c15be40fef38efb872c436

What is more difficult is to create reproducer for this....
What you need to do is to throw System Exception "java.sql.SQLRecoverableException: IO Error: Socket read timed out" in 2-end Phase with CMR Datasource, and you should get 

throw new LocalXAException(bundle.couldNotCommitLocalTx(), XAException.XAER_RMFAIL, re);

and not Rollback.

The reproducer should find also Transaction in status "Heuristic" is store, after Periodic Recovery has run.

Comment 3 Radek Rodak 2017-12-14 14:30:32 UTC
... sorry typo... instead CRM, it should be CMR :-)

Comment 4 Radek Rodak 2017-12-14 14:35:19 UTC
and if same happened in 1-phase before 2-end phase, you should NOT have XAException.XAER_RMFAIL but Rollback... I'm not sure were exactly there is boundary, I think Tom know it by heart :-)

Comment 5 Ondrej Chaloupka 2017-12-14 14:43:09 UTC
Hi Radek,

thank you for the explanation. I think I've created this kind of reproducer. And everything works just fine.

Can you please tell me what is JBoss version you work with?
(what is the jboss version this bug is reported against)

Btw. The Tom's pull request: https://github.com/ironjacamar/ironjacamar/pull/278 was not accepted and IronJacamar handles the errors in the extension (needed to be defined in xa-datasource configuration which you have, as I can see): https://github.com/ironjacamar/ironjacamar/blob/1.4/adapters/src/main/java/org/jboss/jca/adapters/jdbc/extensions/oracle/OracleExceptionSorter.java#L59 (branch 1.4 is in EAP 7.1, branch 1.0 is in JBoss EAP 6.4 - https://github.com/ironjacamar/ironjacamar/blob/1.0/adapters/src/main/java/org/jboss/jca/adapters/jdbc/extensions/oracle/OracleExceptionSorter.java)

Agree that in 1-phase there should be got the rollback. That's something I haven't checked but it works for before 1-phase fine (that is what I was playing with).

Thank you again
Ondra

Comment 6 Radek Rodak 2017-12-14 14:46:32 UTC
Enliste IBM MQ Ressource have to be set to full XA, only one Ressource in Global Tx  can be used with CMR. Which means IBM MQ JMS Connection Factory need to have transaction parameter set to XA ( LocalTransaction would be wrong for this case ).
There is also an issue with IBM MQ Client Version, before IBM MQ 7.5.0.6, I tested it with IBM MQ wmq rar from 7.5.0.8.  With older Version in  some cases there is an issue with uncommited Messages on IBM MQ QueueManager side, even after XA timeout ...

Comment 7 Radek Rodak 2017-12-14 14:49:08 UTC
I used EAP 6.4.17... you need at least 6.4.9 I thing, to have the fix from Tom which does the XAException.XAER_RMFAIL for for Oracle: 08000 (No more data to read from socket)

Comment 8 Radek Rodak 2017-12-14 15:03:37 UTC
Thanks you for the hint... I was not following what happend to Tom's commit after.
You right, there is even more codes inside OracleExceptionSorter.isExceptionFatal()

.... oh looks like this case is already inside :-)

 if ((errorCode < 20000 || errorCode >= 21000) &&
          ((errorText.indexOf("SOCKET") > -1) 

... now the Question is if errorCode for "java.sql.SQLRecoverableException: IO Error: Socket read timed out"

is NOT between 20000 - 21000 .... because if errorCode will be in this range, isExceptionFatal will return false and not true...

Mabe you have to ask jesper, what he thinks :-)

Cheers Radek

Comment 9 Radek Rodak 2017-12-14 15:21:00 UTC
if the errorCode is not in rage 20000 - 21000 , then reproducer should already work without any changes to source code. :-)
in this case in 2-end phase you should see Heuristic Transaction on  "java.sql.SQLRecoverableException: IO Error: Socket read timed out" and not Rollback :-)

Comment 10 Radek Rodak 2017-12-14 15:26:24 UTC
And if it works, we can close this ticket with, "java.sql.SQLRecoverableException: IO Error: Socket read timed out" already correctly handled with CMR :-)

Comment 11 Ondrej Chaloupka 2017-12-14 15:32:18 UTC
Ok, I see.

The XAException.XAER_RMFAIL should be thrown for the Oracle error code '08000' from EAP 6.4.0.GA release (https://github.com/ironjacamar/ironjacamar/commit/7fda1ec4b40031e93a40482fa9eeb82c6a41ae49). You need to have defined the ExceptionSorter in the xa datsource configuration (<exception-sorter class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleExceptionSorter"/>).

The error code that you looks for is here (I think)
https://github.com/ironjacamar/ironjacamar/blob/1.0/adapters/src/main/java/org/jboss/jca/adapters/jdbc/extensions/oracle/OracleExceptionSorter.java#L87


The scenario you describe works fine for me.

I tend to think that this issue could be closed, if you agree.

Thanks
Ondra

Comment 12 Ondrej Chaloupka 2017-12-19 09:56:56 UTC
I'm closing this as my investigation indicates this is not a bug. Radek, please, feel free to reopen or create new issue if shows otherwise.


Note You need to log in before you can comment on or make changes to this bug.